Yeah, don’t remind me of that debacle. V2 was much more of a resource hog as well. I ran 4 nodes on one array for no reason except that it would get me 4x the traffic and my NAS couldn’t really do more than that. I’m glad that is fixed, at least now the discussion is purely about how to use the space most effectively.
So why did you choose RAID but you’re telling others not too?
With your setup with the NAS and 4 or 5 bays, you don’t need to run RAID right… Why are you not running 1 node per HDD bay???
Let me answer for @BrightSilence: Because the RAID already existed and he uses it for other purposes. And as it happens he has free space in this array. So why not using it?
I, on the other hand, have a RAID1 BTRFS array consisting of 4 hard disks of mixed sizes for my personal stuff, but 5 nodes on 5 separate hard disks.
Thanks @donald.m.motsinger for answering this! Though, I’m almost certain you already knew this, since almost every time I mentioned I use an array I gave this context. It’s also not a 4 or 5 bay NAS, that would be extremely wasteful for Storj, it’s a 12 bay that I started out having more than 12TB free space on and could still expand some more for Storj as well as other uses.
Additionally, I started my second node on a separate HDD, since I rotated the smaller ones out of the NAS array. I didn’t buy a new NAS or server and used a new array, because again, that would be wasteful. I’m running an individual node on it. So yes, I am putting my money where my mouth is.
I’ve never claimed that arrays are bad in all situations. I’ll also freely admit that it makes things simpler if it’s a question of managing one node vs 10 nodes. But it’s not the most profitable thing to do if you’re using your HDD’s only for Storj. And it seems you didn’t contradict that (again).
I found a contrary opinion that I’d like to share with you all:
Please, take a look on this table:
Unfortunately he doesn’t provide any data or solid information to back up his claim. So the article is highly anecdotal.
I agree that the URE rates are probably a little lower than listed in practice. However, if there were a significant difference, don’t you think the manufacturer would actual list that for competitive advantage?
He also seems to be using 1TB and 2TB drives in mostly small arrays, which are relatively low risk. And despite that, he is using RAID6 himself on everything but one array which hosts not important data. So he isn’t even practicing what he preaches. Make up your own mind, but I wouldn’t go by this post.
Correct me if I’m wrong, but rebuilding an array means reading every single byte once, then reconstructing the missing byte from the failed disk and then writing the latter.
If we spin this a little further, then what this means is that running a full backup over the same number of disks will have the same failure rate.
I find the numbers in the table unbelievable, but I will go and have a read of the article when I have a spare moment. Perhaps I’m missing sth. important here.
I read it, but the math is beyond me, I have to take his word for it.
So, 10^14 bits is 12 TB. If that is correct, then a 12 TB drive effectively is a read-once drive.
With the risk of sounding like anecdotal evidence, I run several large RAID arrays professionally since a fair amount of time. Over the years I have seen the sizes and failure modes change. Back when 360GB disks were the norm (and failure rates were higher), RAID5 or 50 was standard. We swapped a lot of disks, but it was stable, and rebuild times were short.
For me, the tipping point came when 3TB and 4TB became the norm. What we found was that whenever we had to rebuild an array, the overall failure rate was unacceptably high. The numbers in the table above are somewhat, but not extremely exaggerated, I think.
The reason I got out of it, was that the rebuild times were much longer. Most disks in the array have about the same age and are of the same type. One fails, others are also likely to fail soon. During rebuild, disks get hammered. With that hammering, failures tend to appear at a much higher rate than under normal use. So you want to keep the rebuild times short, or have some extra redundancy. And as I implied, we found that with RAID5/50 that redundancy was not enough anymore for the long rebuild times with the bigger disks.
So we took RAID6 as a minimum. One more disk that can fail before you go down. In reality, we moved to RAID60, as the risk then gets spread over the sets. We have been dabbling with RAID70, but we found that support from controller cards was lacking or immature.
Now what RAID on 10+ TB disks? Since the individual disk failure rate has gone down, the arrays will even be more homogenous, giving more likely near simultaneous failures, and the rebuild times will also be much longer. Would we need even more redundancy? Extremely likely.
But honestly, I personally cannot tell from experience. Because we do not run these disks in large arrays anymore. Whenever we do something in-house with HDDs that is not on SAN or NAS (that are all flash now anyway), we use distributed nodes, each running RAID0 or RAID1. Nothing more fancy. Not even RAID10.
The failure modes have shifted: RAM, network, and especially PSUs also fail. Those failure rates have come so close to HDD failure rates that it is more interesting to put the redundancy on a higher level.
Distribution is the norm now.
IMO, if you want to reduce the chance of simultaneous multiple disk failure, you should use disks from different manufacturers, different ages or at least different batches. The array will be somewhat slower and the chance of A failure would probably increase, but multiple disks should be less likely to fail at once.
No, because a filesystem-level backup is only going to read space that is actually used, and it doesn’t have to read from every drive because there are either one (RAID5) or two (RAID6) redundant blocks per stripe. So it’s going to read a subset of the surface of each disk, and it doesn’t even have to read from every disk for every file.
Additionally, if one disk does return a URE for a block, that can be silently repaired assuming no other disks return a URE for the same stripe.
- Backup: reads a subset of the data. UREs can be silently corrected: in a RAID5, two UREs in the same stripe will kill the array, or three UREs in a RAID6.
- Rebuild: reads all data from all drives (RAID5) or N-1 drives (RAID6). In a RAID5, any URE anywhere kills the whole array. In a RAID6, two UREs on the same stripe kills the array.
Rebuilding an array is very different from running a backup.
If you loose a HDD in RAID 1 or 5, how do you initiate the backup process to possibly repair a URE before you start the rebuild?
Does a rebuild do this by default when a replacement HDD is inserted?
You don’t. A degraded RAID5 or a one-disk RAID1 has no redundancy and therefore cannot suffer a URE without killing the array. (A degraded RAID5 is just RAID0 with extra steps; a one-disk RAID1 is just a disk.)
So what was your point talking about running a backup before a rebuild if you can’t…
RAID 5 and RAID 1 has a 1 HDD redundancy or fault tolerance. What are you talking about…
RAID 6 and 10 have 2 HDD.
I never said you should run a backup before running a rebuild.
Someone else said that running a backup would stress a RAID as much as a rebuild, so why don’t we see more RAID5 failures when running backups. My response was meant to rebut that point; a backup is not as stressful as a rebuild, and a URE during a backup can probably be transparently corrected while a URE during a rebuild cannot in the RAID1/RAID5 case.
You missed the key word: degraded. A degraded RAID5 is an array where one disk is missing; at that point it’s effectively a RAID0. A degraded RAID1 is missing at least one drive. Many RAID1 arrays have only two drives, so in those cases, degraded means no redundancy.
Also, let’s not forget that RAID is NOT backup. It’s a convenience that can lower downtime and data loss since last backup.
Backups are about protection from data loss; RAID is about uptime.
Yes, but you can’t backup storJ data as it changes real-time. So no point talking about backups.
I was not the one initially talking about backups, I was responding to this:
I was simply disagreeing with the second sentence – running a full backup will not have the same failure rate.
The quoted post was trying to imply that the RAID5 rebuild failure rate math is wrong because anyone who takes a backup of data on a RAID5 would have the same failures. My response is simply “no, this is wrong.”
Nobody is suggesting that a storagenode be backed up.