Change from SMR to SSD drive

I noticed that the WD/Hitachi UltraStar HC620s are enterprise class SMR with 14-15 TB

https://www.anandtech.com/show/13523/western-digital-15tb-hdd-ultrastar-dc-hc620

about the SSD, itĀ“s my understanding that they will wear out over time, due to the write/read cycles they can sustain. So, I think this is something more to take into consideration.

IĀ“m contemplating the possibility to build a RAID 0 with two 4TB HDD. ItĀ“s the best cost effectiveness I found. I also think better of them be 5400rps for lower noise and power consumption, while countering the lack of performance with raid performance. But since the companies arenĀ“t being open about the SMR, can someone just recommend some HDD which are for sure not SMR?

Why would you want RAID 0? One disk failure causes the whole RAID volume to be lost, there is no data redundancy. I agree about SSD - we had to replace all of our firewall SSDā€™s with HDDs due to early failure from excessive read/writes.

As far as good HDDs I am favoring the data center class HDDs that are rated at 7x24x365 service and 550TB/yr. For Western Digital they are branded ā€˜Goldā€™ or 'HC500"

IĀ“ve just read a +50 large posts topic bulls fight :wink: about the issue. ItĀ“s clear there are not consensual. IĀ“m aware of the risk, but I think it is not that high. (I only had a broken HDD in all my life, and it was like 20 years ago, I assume technologĆ½ has improved). I just think itā€™s worth the risk.
2x4TB I guess is a good sized node, and itĀ“s the best combination to get such quantity of TB without bankruptcy. Plus, IĀ“ve got the increase of performance of the raid, which is something to take into consideration as we are talking about 5400rpm (I really love the most possible silence and itĀ“s less (I really love the most possible silence and itĀ“s less power consuming). I see IĀ“m repeating myself, sorry.

thanks for that recommendation, but those have a really bud TB/$ ratio. They could be great! The only one IĀ“ve found is 92ā‚¬ (100$) and itĀ“s 1TB. Definitely not profitable for storaJ.
but thanks anyway :slight_smile:

I have seen quite a few failed hard drives. Once both drives in RAID1 developed bad sectors - I managed to sync my data to a new array only because the drives did not have a bad sector in the same place.

It really hasnā€™t. I have quite a few older hard drives that work OK, but newer ones seem to fail faster. New drives have much tighter tolerances than older ones for example.

If you do not want to lose capacity by using RAID (other than 0) you can set up two nodes, each for a separate drive. That way, if one drive fails, it will only take out that one node.

1 Like

clearly you never worked with harddrives actually being used, most regular people donā€™t really experience harddrive failures because it does take massive amounts of usage to wear down a drive, aside from when itā€™s a one off dead on arrival type dealā€¦

just like most people wonā€™t see a car wear out on them, doesnā€™t mean it cannot happen, nor that it doesnā€™t happenā€¦ it just means they donā€™t really use their car that muchā€¦ and maybe keep it in a nice garage, and doesnā€™t drive it to bitsā€¦

harddrives wear out, and depending on use case they can pretty much be dead after 4-5 years, however most good disks with moderate or even heavy workloads can survive 10 years before they hit a wall, usually in this case tho the wall is often that technology has advanced.

running a raid 0 might get you through a year or two with a bit of luckā€¦ but donā€™t expect it to live beyond that, and thats most likely being generous with oddsā€¦ running a storagenode is a fairly heavy workload on the drivesā€¦ unless if you got more than a server worth of drives running itā€¦i would sayā€¦
maybe if its 10k or 15k drives itā€™s betterā€¦ havenā€™t tried those on it.
but using 5400rpm drivesā€¦ well iā€™m sure they are nice and quietā€¦ but they are also like 7-9ms seek time or so

and on top of that you wonā€™t know if data is bad before you read it backā€¦ and no you cannot just check 5 minutes in a video file to assume itā€™s goodā€¦ i know for most people that is fineā€¦ and video is surprisingly redundant to data loss, and on top of that whatever data that is lostā€¦ our brains disregard anywaysā€¦

also there are different grades of dataā€¦ yeah sure you want data to be goodā€¦ but like i say, in a video fileā€¦ you might not even notice if 1mb is missing if its randomly all over the fileā€¦
while if that was your bank informationsā€¦ then you would damn straight noticeā€¦ ofc most donā€™t have the bank data of a movieā€¦ but the concept is soundā€¦could be a shorter videoā€¦ :smiley:

The discussions about RAID vs one node per HDD are about RAID with redundancy. Usually either RAID5 or RAID6. There is absolute consensus on RAID0. No matter on which side of the RAID argument people are, nobody would advise RAID0.

Iā€™m not sure that is representative. It fluctuates a little, but Iā€™ve seen no indication of it getting significantly worse over time. Backblaze is showing a big improvement in their latest numbers.


Their latest report here: Hard Drive Failure Rates: A Look at Drive Reliability
@frances rather than looking at your own very small sample size I recommend looking at stats like this to calculate risk. Keep in mind that Backblaze takes HDDs out of commission after 5 years because failure rates go up quite a bit after 5 years of use.

Thatā€™s a bit of an exaggeration. Assuming the use of modern disks within their first 5 years of lifetime, There is about a 2% annual failure rate. Though this goes up over time. A RAID0 would double the risk as well as double the loss. So 4% failure rate on average. However, after 5 years the numbers get a little less solid, but Iā€™ve seen failure rates mentioned close to 8-10% per year. Itā€™s still not smart to use RAID0, because why should you double any failure rate and loss when a failure occurs. But it doesnā€™t seem to be as bad as you suggest.

well its not because i think the drives will die, but with the load a storagenode puts on them, they mightā€¦ not easy to say without testing long term, however it doesnā€™t take to many errors to take down a raid 0 array, and thats what i think is most likely to happenā€¦ but hey he will get great node performance for a couple of years maybe more with some luck.

also i donā€™t want to say a number that isnā€™t realistic, its better that he is pleasantly surprised rather than overly disappointed, and array that survives doesnā€™t really affect stuff so much as an array dyingā€¦

the blackblaze number are very interestingā€¦ tho one has to take into account that they vet their drives in smaller batches before buying large numbers of the drives they considered a good investment, so their numbers might not be full representing the real failure rate of hddā€™s in general.

but i do like to check if some of their number correlate with what i am buying, if possibleā€¦their numbers ofc is a great representation of the drives they do give us numbers forā€¦

on another note it could be that the drop in hdd failure rate is because of better drives overall, hddā€™s are a market that would be in decline and thus manufactures will need to find new ways to get customers to buy their devices, reliability seems like a sensible marketing and engineering path for hdd, as the ever increasing pace of technology jumps towards making them entirely obsolete.

1 Like

Running a node on each drive would result in the same performance as the load would be split between the nodes. Also, if a drive failed, it would take out only one node.

3 Likes

maybe, how can you know how much successrates will matter when test data stopsā€¦
unless if storj want to rebalance the data around the network all the time to make sure its all evenly usedā€¦ which they would essentially have to pay forā€¦

nobody can know currently if the higher successrates is worth more or less than the lower endā€¦ my bet is on the former because the faster connection people have the more likely they .1 have more money or .2 have better gear or .3 is using it for a professional use case.

you could be right, you could be wrongā€¦ i doubt anyone can know presently aside from those controlling the test data, and that cannot last foreverā€¦

This is definitely a good point. They have been unable to catch big mistakes in the past though and had a few HDDs that performed considerably worse in large use. I wish there were more good sources on HDD stats, but I havenā€™t found any with anywhere close to as much data as backblaze.

A lot of this also depends on which models they have in large numbers of use. I donā€™t know how representative it is for the broader market. So Iā€™m a little careful assigning any broader trend to their numbers.

Iā€™m pretty sure in recent years they tend to market storage space per rack and density over reliability. Just based on the marketing materials Iā€™ve seen. Reliability is covered for large part by warranty, especially for large business solutions, for which replacing HDDā€™s basically doesnā€™t cost anything as it always happens within warranty. So unless the failure rates get crazy high it doesnā€™t matter too much if they are slightly higher or slightly lower than normally. And consumers customers probably donā€™t pay attention to this at all most of the time. So Iā€™m not convinced reliability is actually that much of a marketable trade. Itā€™s more like a baseline expectation.

well, I didnĀ“t catch that, thanks for the clarification. But if one assumes the risk of raid1, how could not be the same for raid0ā€¦ if I have a 2% risk of disk failure, I canĀ“t see much greater risk of failing one of two. CanĀ“t remember my statistics lessons, but I imagine we are talking 1%, or 4%. Either case, good number for me, taking into considerations the advantages.

if I survive for 2years I reckon to renew the disk could be a good thing. At that point the node should be generating enough, egress for that to be worth it, or else all this thing is not profitable at all. Item more, in 2-3 years there should be cheaper or bigger disk comparatively speaking.

because its improvable enough, and IĀ“m having almost double i/o velocity?

well, this actually I have not idea. Is that so? in that case, how can anyone use this configuration? itĀ“s a pretty common used, soā€¦ IĀ“m a little confused. This could change all my calculations, of course. ĀæitĀ“s really that unreliable?

I thought I got this right. My first intentions were to build 2TB nodes. After careful reading and studying I reach the conclusion that you need more stored data to have more probability of egress traffic. So 2 nodes of 4TB would be much worse than one 8TB node. wouldnĀ“t it?

man! IĀ“m discussing raid in here. I feel like a full member now XD.
thanks for all your insights.

No, 2x4TB nodes under the same external IP (or the same /24 subnet) would essentialy be aggregated into a supernode by the satellite. The data would be split between them and so would be your egress. The total amount of data and egress your two 4TB nodes would get would be exactly the same as if you had 1x8TB node (in rare cases you might even get more data).

However, if one drive fails, that node would be disqualified, but the other one would not. If you use RAID0 and a drive fails, your whole big node gets disqualified.

RAID0 - files are split into two and stored on separate drives. One drive fails, all you have is half of each file.
RAID1 - files are stored in two copies - one on each drive. One drive fails, you still have a full copy.

2 Likes

RAID0 was only ever used to get a performance boost if reliability of data doesnā€™t matter. But with SSD relatively cheaply available now, I donā€™t really see RAID0 being used anywhere anymore. There are simply almost no use cases where it makes sense. And yes, it is really that unreliable. A read error on either disk would lead to lost data and any disk failing would lead to all data being lost. And since there is really no advantage for running one larger node vs two smaller ones, there is no reason to assume that additional risk at all.

3 Likes

I donā€™t monitor success rates, so I have no data to compare to :slight_smile:

As said, the date when i stopped getting lots of ingress to this node was some weeks after I switched to the SMR drive, so there doesnā€™t seem to be any connection to that at all.

TLDR in your case and with your knowledge i would say find a 3rd drive and do a raid5 ā€¦ then you can always loose a drive and simply replace it, if that isnā€™t an option do two nodes one on each drive.

[rants and reasons]
This is a gross simplification and an erroneous one at that, raid1 vs raid0
raid0 will suffer horribly when writes errors or problems of any kind are encountered, because data is striped across both drives only one needs to fail to ruin a lot of the dataā€¦ it might not all get corrupted, but with only a few read / write errors you will see (i duno exactly) maybe 25-50% of the raid0 becoming corrupted.

with raid1/mirror you can loose either disk or maybe 2-5-maybe even 10% before odds that the data overlaps with the damage on the second disk and thus corruption starts to take effect, however since this isnā€™t a striped volume either, then corruption isnā€™t to big large stripes across both drives, but to individual parts of the disk containing individual files or folders, thus loosing data on a raid1 is directly due to outside effects like lightning strikes or water damage, or other general negligence from the sysadmins sideā€¦ raid1 is considered one of the safest ways to store your data.

so if we say 2% chance of failure on a drive, then the odds of a raid1 failing in the first year on disks that are tested good before put into production, will be like 2% of 2% and thats a pretty high number even because, even if both drives fails not catastrophic you will be likely to be able to recover most of the data.

while raid0 again with good drives have 4% change of catastrophic failure of a drive, but even just the slightest issues with cables, backplanes, or general bad sectors, damage from vibration or accidental bumps, can often kill your raid0 dead in a heart beatā€¦
but if you are sure you know what you are doing, sure raid can have its use cases, and this is only a store for basically replaceable data, sure you could get away with running a raid0ā€¦ for a whileā€¦ but raid0 will almost always fail critically or be subject to large amounts of bit rotā€¦

since you donā€™t seem to know what you are dealing with, it might be a terrible idea, if you want to keep a stable nodeā€¦ if you got two drivesā€¦ maybe you are better of getting a 3rd one and doing a basic raid5 with 1 nodeā€¦ or two separate nodes, on top of thatā€¦ if you expect to use an old SMR drive to do thisā€¦ then itā€™s doubtful that it will truly help you causeā€¦ you might be better of looking at doing some sort of cache solution to make more sequential writes and reads of the drive.

yes raid0 is the worst and should never be used to stable data storageā€¦ its like a performance tuned racecarā€¦ the mileage isnā€™t great, but it will surely get there fastā€¦ but it wonā€™t last longā€¦
raid0 are often used for various temporary data buffers, where the stability of the data is of no concern.

2 Likes

yeah i was looking at drive last night because i had one i thought was dead, and found a seagate that was cheap and blackblaze was using a version with a 0 instead of a 3 at the end o the model nameā€¦
then when searching around i found another model that was name with an extra 0 ā€¦ so 000 or 0000 and then 3ā€¦ difference in price was about tripleā€¦ and same sizeā€¦ so yeah better be careful with those model numbers lolā€¦

well you supposedly can get 60tb 2.5" ssd drives nowā€¦ so my thinking was maybe high number of rewrites / reliability because the hdd tech is very establishedā€¦ ofc ssd are basically RAM in raid configurations or suchā€¦ so i guess that is also a pretty well know techā€¦ but still hddā€™s are much less changed than this technology and doesnā€™t need to scaleā€¦ so maybe that could be what they will try to market them on in the future, because they lost the capacity front, now itā€™s only price and reliable high number of read writes.

ofc HAMR is also on its way for hddā€™s which may change it once againā€¦ but one would kinda doubt a magnetic version of vinyl records could ever compete with laser light printed circuit boardsā€¦
but iā€™m not really qualified to guesstimate thatā€¦ but thats my opinion

Thanks for the advice.
What are those limitations? I was thinking about buying SSD :confused:

The chief concern is SSD lifetime or endurance. Considering that Storagenodes will put constant demand on storage the duty cycle is critical.

1 Like

Or get a 3rd drive and run a 3rd node on it. :wink:

Thereā€™s really no reason to use RAID5 unless people are giving you free drives left and right, they all happen to be exactly the same capacity, and you canā€™t manage to fill up your node. Otherwise any redundancy is just wasted capacity that you could be using to run another node.

3 Likes