RAID5 Question - To rebuild or to migrate to a new cluster, rebuild and move back?

joesmoe · September 7, 2020, 3:48pm

So on synology raid5 8x16TB drives and one fails.

Is there less risk of copying all the data to a new array (i just so happen to have a new on sitting here) instead of having it rebuild? After reading all there horrors stories about rebuilding RAID5…

Thanks in advance

littleskunk · September 7, 2020, 5:47pm

It would be better to transition into running 8 individual nodes. You get +16TB of space and the impact of a failure is smaller.

joesmoe · September 7, 2020, 5:56pm

This is understood, however the array is not only for StorJ (albeit does use most of the space). It is non optional to use raid for this synology.

kevink · September 7, 2020, 6:15pm

honestly, with 8x16TB you are maxing out the possible storj space anyway so for data security I would go for something like a raidz2/3 but that wasn’t really the question.
Don’t know if copying data from a failing raid5 is safer than rebuilding but with the horror stories in mind, I wouldn’t bet on a new raid5 either.

SGC · September 7, 2020, 8:12pm

the problem with raid5 is that each data has two points the parity and data, while raid 6 has 3 points … thus by essentially voting the raid6 can identify which drive is wrong.

not sure most synology runs plain raid5 tho… but more like some sort of hybrid raid5 thing… but not 100% on that, but you might want to look at that…

rebuilding a lost raid 5 disk requires the system to read all the remaining data and compute it into the lost redundant disk data.

so it may take longer to rebuild it… than simply copying the data out of the array… ofc you won’t have to bother with having space to move the data to meanwhile… so there are both benefits and disadvantages to both solutions…

if you are running plain raid5 with that many disks you would be better off running raid6
also because the disks in a raid runs in harmony / sync with each other, it means that your entire raid5 array has the iops of a single disk… which isn’t great

tho i know many people like to run without redundancy, i think it’s worth the hazzle… just because one will not have to start over for minor issues, but you may run into some kind of iops limitation as your node grows in size

running multiple nodes would give you x8 the iops… and 1/7th extra disk space… ofc you would then be completely exposed to data errors, but storj isn’t to bad about that and it might be much easier to keep running that way… atleast on a synology.

the smaller the raid array the less the problem becomes ofc… if you are running a hybrid raid5 then you may have some mitigation of the regular raid5 issues, like checksums which makes the system able to locate which disk is in error and thus not overwrite the correct data when data is corrupted…

if you have that kind of raid5 then running something like 4 with 1 redundant is a pretty good spot imo… i know it costs extra space, but with 2x raid5 arrays like that you would have x2 the iops, and 1/7th less data again… ofc if you don’t lack iops or so then migrating to running raid6 might be my preferred choice…

TL;DR
if you got regular raid5 and not some fancy hybrid then go raid6
if you need more iops split up the array or like little skunk suggests make 8 nodes on the 8 drives for x8 the iops.

if i had the room i would copy out the data… ofc the max space you can use on storj will keep you limited at maybe 40tb max data anyways without multiple ip’s
and could take years to reach it… if it doesn’t cap out at 30tb nobody really knows how high avg deletion ratio will be.

Sasha · September 11, 2020, 3:42am

People recommend changing RAID 5 to RAID6, however running 8 individual nodes with 16TB on 1 NAS instead of 1 node with 8x16TB, also means:

Con’s:

You need to maintain 8 individual nodes rather than 1.
Possible 8x the increase in CPU and RAM usage requirement for 8 nodes.
Possible 8x I/O queue or overhead increase
8 different reputation’s to maintain.
If all 8 docker images are going through 1-IPv4 address, then you’re not really increasing the possibility of more storage or bandwidth that you would receive from satellite.

Pro’s:

Loose 1 HDD, only 1 node is lost. Rather than possibly the whole RAID5 array on re-build.

Alexey · September 11, 2020, 6:54am

They will share both resources. Since the traffic in 8 times lower for each of them, all other resources are used in roughly to 8 times less…

donald.m.motsinger · September 11, 2020, 8:26am

Pro’s:

More available space

BrightSilence · September 11, 2020, 3:10pm

Also means you’re spreading your risk.

That’s not a con, it’s just not a pro. So neutral.

Your con’s are kind of falling apart.

But, you’re never going to fill up 8x16TB anyway, so feel free to make a RAID 6 setup with that to make it a little simpler. RAID 5 is a death sentence on this number of large HDDs.

joesmoe · September 12, 2020, 3:03pm

Remember guys, i wasn’t asking raid5 versus raid6. The normal use requirements of this machine are that it’s raid5. That isn’t optional - for their use - it is all they desire.

The question is if it’s less hard on a failing 8 drive array to just transfer all the data over, or to rebuild.

I have two arrays at the same location. I would not even nee to copy the data back in order to restart the storj nodes.

I think the answer is that transferring it over is potentially less work on the origional drives.

BrightSilence · September 12, 2020, 10:32pm

I would say so, yes. But either operation is very risky.

I can’t think of any reason why RAID 5 would be desired with such a setup. You’d be better of just spanning data at that point since at least then you only lose the data on the disk that failed. You either care about the data on there or you don’t. If you don’t care, why waste an HDD on redundancy to begin with. And if you do… you should actually be protecting the data.

I hope the copy goes well. I really do. But the odds are against you.

Sasha · September 14, 2020, 2:31am

Those were the takes I took away from the numerous threads on RAID 5 vs stand alone multiple nodes on 1 box.

BrightSilence · September 14, 2020, 7:51am

Well, if the rebuttal was missing in those original topics, I hope it helped here. There are basically just 2 cons. You have to manage multiple nodes and you have to go through multiple vetting phases. The last one can be mitigated mostly by just making sure you always have a vetted node with space available on all satellites. As a rule of thumb you could start a node about every month.

There’s also another pro not listed. There will be a much lower number of reads and writes per HDD. While on RAID every read and write if large enough will still hit all HDD’s. With separate nodes each node only has to deal with 1/8th of the reads and writes. This can be especially useful on SMR HDD’s which would buckle under normal node loads.

SGC · September 16, 2020, 6:09am

the more drives you have in an array the more time it will take to read the entire array because all drives have to be read and processed to create the new data… which then also have to be written to the new drive…

in theory the difference wouldn’t be much…like say if you got 4 drive raid5 then it will only need to read 3 drives to create the data for the 4th replacement drive… however the more drive you add the more often you run into bottlenecks.

like limited processing to create the data for the new disk… processing say 7x16tb is a non trivial matter, thus it may end up taking days upon days, which then increased odds of having a secondary read error or worse.

also in many cases the entire volume will be read, even the empty parts of the disks… thus a rebuild may take 90% longer than a copy if you only need to extract 10% of the volume due to the rest being empty space…
its really very much a matter of how well you know your setup… but in almost all cases i would say that for an 8 disk raid5 copying the data to another array is the safer and faster option.

if one didn’t have so many disks in a raid5 array then it would be much safer to simply rebuild, ofc it does safe a lot of overhead costs… but really… at what cost… i think the sweet spot is about 4 drives in a raid5 type setup

maybe 5… and ofc as your redundancy gets cheaper your rebuild bandwidth goes up