Migrate from RAID5 to RAID6

Robertomcat · June 17, 2020, 4:02pm

My cancellation rate is absolutely great.

My node has been active since May 5, and is running on an HPE ML110 G9 server, 64GB of RAM, the Windows server 2016 operating system is on a RAID 1 SSD (Managed with the b140i integrated controller), the storj data is hosted on a RAID5 8 × 10 TB (configuration is long overdue, And the disks are in two hotplug cages), these disks are managed through a P440 4GB hardware controller (the controller’s RAM memory is all dedicated to writing), the Internet connection is 600/600 and the server’s geographical location is in Valencia-Spain.

I have analyzed 990MB of logs since the node is up and running, and the result is this:

See if anyone can give me some information, or I should open a specific post. Thank you.

anon27637763 · June 17, 2020, 4:56pm

RAID 5 with large disks is going to be a problem later. However, it may be a problem right now with write speed.

If you have nothing else running on the server, it may be a good idea to scrap the current configuration and run either RAID 10 or ZFS.

Robertomcat · June 17, 2020, 5:22pm

Writing problems I don’t think I have, because the controller already takes care of efficient writing, besides, there is not so much data to saturate the writing.

Right now I can’t change the parity distribution (and in the future I don’t think so either) because it should remove a lot of data besides storj. I know that a RAID 5 is not the best thing when it comes to keeping data safe, but right now there is no problem with reading or writing. What I don’t like is the rate of cancellations I have on the node, it’s terrifying. Right now I have 2.3 TB of files stored from the node.

anon27637763 · June 17, 2020, 5:31pm

Without write cache enabled, RAID 5 write speed is going to be quite slow on very large SATA drives. However, with write cache enabled, mild low level data loss is possible. I believe Storj recommends disabling write cache to avoid silent data loss/corruption which will eventually lead to node disqualification.

You also mention that there’s a “controller” involved. If this is a reference to a hardware RAID 5 controller, there could be some serious drive problems hidden in the abstraction layer of the controller. In essence, your server storage is very likely to fail quite disastrously.

Robertomcat · June 17, 2020, 5:43pm

The writing cache of the hardware controllers, is just made to perform the fastest writing, not what you are commenting on, and I have left only the whole cache dedicated to writing, because I have no fast access data, since all the data I have stored is read randomly and there is no constant access data to be read from the controller cache.

Do you think these enterprise-level hardware controllers have been manufactured to have low-level data loss? I think absolutely not, it would be a disaster for any company.

Here are the results of a test:

anon27637763 · June 17, 2020, 5:54pm

I had the exact same reasoning when starting out as an SNO. You can find my misinformed posts from earlier threads, arguing fairly similar points.

However, I have come to the conclusion that @Alexey is correct in regard to RAID 5… and the hardware controller makes matters worse.

SGC · June 17, 2020, 6:02pm

the problem isn’t the writing the problem is raid5 and how much data your array will have to read to resilver/rebuild from a drive loss…

lets say you loss a drive… then you will have to rebuild it… which will take 14 hours in like perfect conditions, and most likely much much more… on top of that… the array may have been running for a year or two and meanwhile a drive may have gone bad, but not died… then when you start stressing the array it dies…

ofc thats the worst that could happen… then what would happen on avg… well you would loose a drive replace and rebuild it… however rebuilding requires you to read 70 TB of what is now essentially a raid0 …
and if you get a single bit error then it’s big trouble… in most cases… the manufacture says on older drives its 10^14 reads pr bit error… i think its bits anyways…
a TB is 10^12th so you get into the range where it’s not only likely to see a bit error, but it should be considered the norm.

so lets assume it all goes well… you will actually manage to rebuild the array… but unknown to you another drive has gone bad and just slows you down… then you might expect 20% of your expected speed… or even 10% … on avg… making your 14hour rebuild into a 3 - 6 day rebuild…
3-6 days where you are running raid0

so lets say you are the lucky winner and no drives actually fail in your raid 5 array… but you get some bad data back… now the controller has 2 drives that contain the same data… but each one if slightly different… and it has no way to identify which data is bad and which is good…

so it looks and say… oh well this one drive has had a bad sector in the past… so thats a bad drive and then it overwrites your good data with corrupt data… and continues happily on its way informing you that it’s solved a problem… while only making it worse…

raid6 is the recommended minimum in normal raid…
you might get away with a raid5 type solution in zfs… but zfs is slower in iops and your array has pretty terrible iops because all drives have to be striped in sync… thus you will have iops on your array like that of 1 hdd… if all of the drives work like they are suppose to… which the odds are ofc 8 times magnified that they aren’t…

so yeah make it raid6… and it should all be good… sounds like a nice setup… i sure wouldn’t mind having 8 x 10tb drives

Robertomcat · June 17, 2020, 6:04pm

Well, that’s all I can do for now. If I changed the parity system in the future, I would put in a RAID 6, which is also recommended around here, but I can’t do anything else at the moment.

I can only ask you here to see what would be the best configuration. Thank you!

SGC · June 17, 2020, 6:05pm

im sure a good lsi controller will attempt to make it work… but the fundamentals of mathematics is simply against the methodology … wendel from level1techs on youtube explains stuff about raid pretty well and he also made some very indepth experiments with injecting bad data directly into raid arrays, zfs pool and btrfs to record how they behaved… or didn’t behave…

SGC · June 17, 2020, 6:10pm

Just got this puppy up and running this morning lol
raidz1 is basically raid5 but with checksums so it can detect and identify errors…
aside from that i only need to read 2 drives to rebuild 1 … not 7
and they are only 6tb, mostly lol a good deal of 3tb in there which i will be replacing with 6tb as i get around to it.

the reason for the setup is that this way i get 3 times the iops, thus i can handle thousands of raw reads and all writes goes to a dedicated ssd and is then written to the hdds sequentially every 5 sec.

 -                                                 capacity     operations     bandwidth
pool                                             alloc   free   read  write   read  write
-----------------------------------------------  -----  -----  -----  -----  -----  -----
rpool                                            55.3G  83.7G      0     22      0   342K
  ata-OCZ-AGILITY3_OCZ-B8LCS0WQ7Z7Q89B6-part3    55.3G  83.7G      0     22      0   342K
-----------------------------------------------  -----  -----  -----  -----  -----  -----
tank                                             13.6T  19.1T      1    194  4.80K  2.84M
  raidz1                                         8.16T  8.20T      0     56  2.53K  1.08M
    ata-HGST_HUS726060ALA640_AR11021EH2JDXB          -      -      0     18  1.03K   370K
    ata-HGST_HUS726060ALA640_AR11021EH21JAB          -      -      0     18    784   369K
    ata-HGST_HUS726060ALA640_AR31021EH1P62C          -      -      0     19    750   369K
  raidz1                                         5.43T  2.74T      1     59  2.27K   401K
    ata-TOSHIBA_DT01ACA300_531RH5DGS                 -      -      0     20    614   134K
    ata-TOSHIBA_DT01ACA300_99PGNAYCS                 -      -      0     19  1.13K   134K
    ata-TOSHIBA_DT01ACA300_Z252JW8AS                 -      -      0     19    545   133K
  raidz1                                         2.08G  8.17T      0     54      0   889K
    ata-HGST_HUS726060ALA640_AR31051EJS7UEJ          -      -      0     16      0   296K
    ata-HGST_HUS726060ALA640_AR31051EJSAY0J          -      -      0     17      0   296K
    ata-TOSHIBA_DT01ACA300_99QJHASCS                 -      -      0     20      0   296K
logs                                                 -      -      -      -      -      -
  ata-OCZ-AGILITY3_OCZ-B8LCS0WQ7Z7Q89B6-part5    18.2M  4.48G      0     24      0   514K
cache                                                -      -      -      -      -      -
  ata-Crucial_CT750MX300SSD1_161613125282-part1  3.55G   596G      1      2  6.90K  17.8K
-----------------------------------------------  -----  -----  -----  -----  -----  -----

Robertomcat · June 17, 2020, 6:12pm

Yes, you are right in everything you write, but from the beginning the data you put into that storage was not of great importance, but you still had data.

When I started a node, I didn’t think I would like this, but over time I see that there is a very interesting community related to storj, and time has passed, and in the end it has remained in a RAID 5. The problem is that right now I have 2.3 TB of storj information, and to make an elegant exit and then restart the node, I think I would lose all identity and “antiquity”.

I have to add that the hardware controller has a constant error detection system of the disks, and when it finds something strange in some sector, it rectifies it, fixes it or does what it has to do so as not to corrupt the data. At the moment I have all the healthy records.

SGC · June 17, 2020, 6:14pm

i’ve rebuild and redesigned my planned setup maybe close to 10 times by now … only really remade it 3-4 times tho… hopefully this time it will stick… i also started on a raid5 on a lsi raid controller… atleast you got that going for you… LSI is the best.

Robertomcat · June 17, 2020, 6:15pm

I tried to do the migration from raid 5 to raid 6 having a lot of space available in the current raid 5, but the controller only lets me go to raid 0.

SGC · June 17, 2020, 6:18pm

you should be able to reduce the size of the array by 1 drive, still running raid 5 and then change it to raid6
but you really need to be familiar with the sequence of how to make it do that… imo the lsi software aren’t always top of the line… even if their controllers are super reliable protects your array against user errors…

you might be able to find a guide to it somehere… i think the P440 is like LSI 9260 or something like that…tho hp might have their own software for it… not really tried the hp version

Robertomcat · June 17, 2020, 6:22pm

Through the HP SSA tool, it only lets me do a 0 raid from a 5 raid. I can also change the band size.

2020-06-16_12h31_13

John.A · June 17, 2020, 7:39pm

What HP server do you have. And Raid controller

Robertomcat · June 17, 2020, 7:51pm

The server is a ML110 G9 y P440 4GB

SGC · June 17, 2020, 8:07pm

well seems odd that one would have to go to a raid0 to go to a raid6… but maybe the math is simpler that way.

check up on if its how it is done… i’m sure it’s very clearly stated in the controller documentation, which you should be easily be able to find online.

Robertomcat · June 17, 2020, 8:14pm

There is nothing else inside the SSA administrator, only what I indicated in the image above.

I was also told, that since the server natively did not support Raid 6, the controllers could not exercise it either, but I find this absurd. Some time ago I had a G8 micro server that also didn’t support raid five natively, but the controller could do it instead.

I was talking to a guy on the official forum, and he told me about this, which I didn’t believe.

SGC · June 17, 2020, 8:31pm

well it sounds like bullshit… tho they may have some other reasons for disabling the raid6 ability of the server… however that is most likely a software thing…

does seem to be a nice raid controller tho… much newer than the one i got for my server which will do raid 6 and raid 60

this is from another discussion on the hpe forum… they talk about the features being related to the battery backup or something like that… so that if your battery isn’t working the controller will not go beyond raid 5

Standard on the P440 are RAID 6, RAID 60, RAID 1 ADM, Capacity Expansion, mirror split, recombine, and rollback in Online Mode, Drive Erase, Performance Optimization-Degraded Reads and Read Coalescing, Move/Delete any individual LUNS

either you can most likely flash the controller back to being a standart lsi controller… or you can disable it and go zfs…

zfs is a good deal of work (stuff to learn) and won’t work great on windows… even tho it’s getting closer every day.