Suppose the following:
A node has been running with 100% availability for 24 months.
In the 25th month, one of the disks fails and the SNO takes the node down for 12 hours to rebuild the disk array (rebuild from parity).
After being offline for 12 hours, the SNO restarts the docker container using the same identity, same storage-dir, same everything.
Would this be a good strategy to handle a failed disk?
Would the node be DQ’d for being offline for 12h (exceeding the SLA 5h of downtime per month), even after having 2 years of 100% availability?
If a small number of files (example: 5 out of ALL the storage/ directory files) could not be recovered from parity, could the node still resume with its original identity having most of the files?
I’m trying to figure out the best way to handle a future disk failure and still preserve the node.
I read about the strategy of running “one node per disk”, and discarding the node if a disk fails, but this seems wasteful if 99.9% of the files can be recovered using a disk array with parity.
“Parity is a waste of space” you say? In my setup, I already have parity protection for my personal files, so I’ve already “lost” the space. I can protect the storj directory “for free”, but the rebuild time will take ~12 hours.