2nd node or just increase the 1st node?

arrogantrabbit · August 2, 2022, 6:38pm

That article, and other derivatives of the original one that first appeared at zdnet (I think), are flawed and massively overestimate the probability of the failure; the results contradict common sense: if their conclusion (last sentence in the quote above) was even remotely plausible we would be finding new bad/rotten sectors and seeing checksum failures on “almost” every scrub (which, as a side note, shall be scheduled periodically to keep data viable!): scrub involves exactly the same steps as rebuild (read data from all disks, compute and compare checksums). This of course does not happen in reality.

I recommend RAID5 owners (and especially, Synology owners – that run btrfs on top of md raid5, where scrub involves scan of entire disk surface regardless of utilization) to look at their (hopefully, at least monthly) scrub logs for the past few years and see how many correction were actually made. You’ll be surprised how close it would be to 0 and how far that is from a predicted mayhem.

Bottom line – RAID5, RaidZ1, and other singe-disk fault tolerance arrangements with reasonably small number of disks (I’d say up to 12-16) are totally fine, provided they are scrubbed periodically. if the array is not scrubbed for years – then sure, I too would not trust it to rebuild either, nor data to be viable.

(Note: RaidZ1, unlike Raid5, is special in that its replace operation on the vdev preserves access to redundancy afforded by all present disks, including the one being replaced, for the duration of the replacement, so it’s not a fair comparison, as it is much safer)