Migrate from RAID5 to RAID6

Robertomcat · June 23, 2020, 1:43pm

I really have no other solution than to run a raid 6, in the future I don’t have to incorporate any more disks, because the ML doesn’t have any more bays, it would be a configuration to leave it for “years”.

You are commenting many things about the zfs, but I don’t have any idea about that file system, I am using a Windows Server 2016, and the file system is going to be the veteran ntfs for the whole raid 6 block. The server is operating since 2018 with a raid 5, and the system has worked correctly without any problem, even having to rebuild a disk that I thought was broken, and I took it out and incorporated it again without any problem.

The truth is that I am sorry to destroy the Raid 5, but well, you always have to make changes for the better. Greetings!

BrightSilence · June 23, 2020, 2:07pm

It’s not 20-30%. RAID5/6 have to go through a read+write cycle on every write. In some cases an 8-disk array in RAID6 would barely be 1.5-2x the write performance of a single disk. It’s definitely nowhere near 6x. With ZFS writing everything twice (first to ZIL, then to array), you have to cut it at least in half for sustained writes.

You’re right about this one, I was mislead by oversimplified images that show HDD’s as parity HDD’s, but parity is actually spread across disks, like with RAID5/6.

Yeah, the zfs stuff is not relevant for your setup. You can safely ignore it. It’ll be a bit of a chore to make the transition, but I absolutely believe it will be worth it to keep this thing running for a long time to come.

SGC · June 23, 2020, 7:09pm

@Robertomcat
yeah if there is one thing we can agree on then it is that raid 5 is a disaster waiting to happen… might not happen for a long long time… but it’s a inherent flaw in the technology

@BrightSilence
did a write test to my pool gives me an avg of 814MiB/s which is about what i would expect out of 6 drives…i ofc have a dedicated zil…zero intent log aka slog /seperate log
but that only takes sync writes currently…

ofc this doesn’t isn’t representative of random read writes…
which i will try to test…not sure how to do that in linux terminal tho

Robertomcat · June 23, 2020, 7:17pm

Well, I’m going to give you some news that will surprise you (for bad, of course), in the end I’m going to keep the raid 5.

To do either of the two options for a raid 6, I will lose the node if or if.

If I do the cloud backup, I have spent at least four days uploading files, with a total of more than 1.4 million, the upload of this kind of files is very slow, and I have done the download test of these same files, and it is just as slow, they are very small files and they need a lot of network resources.

Then I have done a copy and paste test, from raid 5 to a raid 0 of 2 SSD, and the transfer rate has been very low too. So in both cases I need a lot of hours, at least two or three days, and in that time I will be disqualified.

So, I’m going to leave Raid 5 forever and ever.

SGC · June 23, 2020, 7:21pm

how is your node doing now that the test data has resumed? with my rather crazy setup i just went from like 0 iowait to clost to a 10% avg… which is a bit high… but also doing a lot of work it seems
raid5 on a 8 drive 70gb array is a mistake… but your lesson to learn i guess.
raid5 won’t fail today or tomorrow… but it will one day most likely…

BrightSilence · June 23, 2020, 7:25pm

Disqualification for downtime is still disabled. Honestly, if you wait now you may not ever get the chance to make this move again. I’d grab this chance if I were you. A few days downtime will barely impact your node right now. It’s up to you of course. But I’d say it’s a golden opportunity.

Robertomcat · June 23, 2020, 7:26pm

I don’t know what you mean, but these are today’s figures, and I still have a cancellation rate of 95% or more.

SGC · June 23, 2020, 7:26pm

you really should…

SGC · June 23, 2020, 7:29pm

the cancelled are being recorded wrong, those you cannot use for anything… disregard them

i got the same here… tho my cancelled is like 60 last i checked but that was age ago… seems kinda pointless now that i understand how irrelevant they are…

Robertomcat · June 23, 2020, 7:31pm

You are absolutely right, and I can still think until this Thursday that I have to physically go where the server is. It is clear that 8 × 10 TB in raid 5 is an aberration, but well, I’ll have to meditate a lot to see if in the end I change it to raid 6.

SGC · June 23, 2020, 7:38pm

if memory serves it was still kinda tiny enough to be on one drive… so find a drive it will fit on with some headroom for maybe a week worth of ingress or limit the node size
then rsync or similar sync / copy program to a usb external or whatever… when complete redo the array and throw it all back… i had to do much the same for migrating my own node… was so much fun… took 5 days to copy the 9tb and then i spend 5 more to copy it again because i had made the blocksize to be 16k and it took forever to scan it… then on second try i made it to 512k blocks… tho 128k i what i should recommend, then it copied the at the time 10 tb node in like a day… world of a difference…

Robertomcat · June 23, 2020, 7:52pm

What you’re telling me is this picture thing, right? The size of the bandwidth, that the bigger the size, the smaller the total capacity of the disks, although I haven’t done tests.

The bandwidth size and that I can migrate it hot, since the controller allows it, but in case I finally decided to switch to raid 6, I don’t know if I would leave it in a big or small size. When the size is bigger, there is better performance, that’s what I understand, but I don’t really know if it will be very effective for the storj data. As for the audiovisual content I have downloaded to the server, I don’t care if it has a larger or smaller bandwidth size, what matters is for the 2.1 MB storj data.

2020-06-18_13h02_52

SGC · June 23, 2020, 8:08pm

128k is a good avg from what i know… doesn’t perform horrible for anything and usually gets like 50-70% range of peak performance, so i cannot recommend anything else as i’m not really that well versed in these big block sizes myself… but i’ve been deep diving the subject lately… but again while working with zfs so not 100% transferable information in all cases… but there is a reason they like to recommend 64k-128k no matter who they are… even microsoft recommend that for almost every possible work load …

from what i can gather higher numbers makes my system perform better… but it also means i waste a lot more RAM… maybe a lot more cache… so don’t go to crazy…

128k and i would say max 256k … but you might regret it and i doubt you can change it… i can change it on zfs so i’m not stuck with it even if it may linger for years…

else find somebody that is running something similiar to you… i think brightsilence is on windows and says 64k… so we are pretty close…would you disk perform better at 256k… maybe… even better at 512k…
maybe… but think about it this way… everytime there is a problem… like latency … excess ram usage… cache flushing… then you double it… 64k to 128k… double trouble… next to 256 quardrupled the effect… next level 8 times the effect… so maybe 8 times the ram usage… 8 times the cache use for in some cases when its not needed…

it can cripple your performance to set this very wrong… so most don’t dare to go off the rails for production usage which i would also recommend strongly against… maybe ask the forum… see what the concensus is and what their experience says with this when running nodes… because it will be very specific to the workload… in some cases 16k or even 8k might be best… even if very rare today for most people… and other times one can just max it out because it doesn’t matter because its containing like hd movies.

Robertomcat · June 23, 2020, 8:18pm

We’ve been over this up here, and from what I can see from both you and Bright, I think it’s a good match for 128K.

The truth is that if the files are 2.1 MB, it doesn’t make sense to incorporate a 1024 k/7 MB band size. When the disk saves the file within a piece of band, a lot of space would be wasted.

BrightSilence · June 23, 2020, 8:28pm

Learnings from ZFS really can’t be transferred to RAID5/6. A strip size in RAID6 of 64k would lead to a stripe size of 6x64k = 384k in your setup. ZFS doesn’t work like that so it really doesn’t at all apply here. The impact on RAM is also negligible compared to ZFS. It’s a different beast, so when someone is asking about RAID6, lets stick to RAID6.

In general larger stripe sizes would be faster for larger files. Smaller stripe sizes are faster for smaller files. The default in most setups is 64k strip size and I would recommend sticking with that as it neatly lines up with the maximum file size for storj. 128k lines up as well, but would probably slow down the database writes and smaller pieces a bit more due to overhead. I don’t think that tradeoff is worth it. So stick with the default. Synology doesn’t even let you change it from that default as far as I know. (Unless you manually edit the config files)

Edit: Changed some terminology to match the terms your controller uses.

SGC · June 24, 2020, 6:58am

well it was really benchmarking of a few hundred systems comparing different block sizes performance over different hardware using dd to benchmark.
so really they where mostly comparing hardware… ofc it was all zfs users… but still seems very blocksize and hardware related… oh and i actually think they removed their drives from the pool before testing it… so it didn’t have partitions or filesystems, just empty drive and using dd to test…

don’t think zfs should affect that much… but zfs systems can be a bit weird
anyways testing on none raided hardware, they seemed to give the best avg at 128k when looking at throughput, latency.

ofc since it wasn’t for raid it might not apply… but i just found it very interesting that all the hardware best avg benched blocksize was 128k

seemed very intune with all the stuff i’ve been hearing about using blocksizes over a certain size…
might have been ssd’s tho lol not totally done with my deep dive into blocksizes

Alexey · June 24, 2020, 8:00am

The author do not have zfs and can’t use it because of Windows. Please, do not compare apples and oranges. The experience with zfs is not related to the topic

Robertomcat · June 24, 2020, 4:20pm

SGC · June 24, 2020, 5:04pm

actually i hear it works pretty good these days, but it’s been a long road getting there and still some features lacking and bugs to be fixed… just like zfs on linux
https://openzfsonwindows.org

also seems like microsoft recommends using 4k blocksizes… so thats interesting…
but thats non raid…

the general consensus seems to be that 128k stripe size is the best choice if one doesn’t know what to pick… no matter which filesystem or hardware raid one is running…
most likely down to some of the fundamental of how the disk, software and bus’s interact with the data as it needs to permeate through all of them and thus gets jumbled around …

like say i know i can run 4bit bus lanes to my cpu, if i want the QPI bus lanes to have redundancy… stuff like these bus’s will also greatly have an effect on the latency and or speed of the data being shifted around and processed through out the system

so yeah 128k is the recommendation for raid stipes… zfs or otherwise…

wasn’t so much apples to oranges… more like tomato, tomato… potato, potato…

BrightSilence · June 24, 2020, 8:29pm

I’m a little baffled that that is your response after it being pointed out several times that the OP is going to stick with NTFS, which on windows seems like the right choice to begin with. I know you’re just enthousiastic about your setup, but at some point it becomes counterproductive.

So on to the next step. There is no universal answer to this question. It depends on workloads and it heavily depends on size and type of the array. You link to a forum post of seemingly not very well informed people discussing a 2 disk RAID0 array. I don’t see how that is anywhere close to relevant for an 8 disk RAID6 array. RAID0 doesn’t have to deal with parity calculations on every write, which means it doesn’t have to go through the read write parity cycle like RAID5 and RAID6 need to. Therefor different optimizations apply. If a RAID0 needs to write something smaller than the chunk size, it just writes to a single disk and is done. RAID5/6 needs to always read from all disks, calculate new parity and write to all disks. So single disk write operations are out of the question anyway.

64k chunk size would already lead to a stripe size of 384k. With a 6+2 array, 128 stripe size isn’t even an option.
The options for chunk and stripe size would be.
4k/24k
8k/48k
16k/96k
32k/192k
64k/384k
128k/768k
etc.

Because of the file sizes of most storj data, 64k/384k would be a good match.