5 nodes on the same HDD vs 5 nodes on a separate disks

humbfig · November 19, 2022, 4:38pm

Same location, same hardware and same disk brings more correlation to failures, apparently. But you cope well with unreliable hardware (RPI’s) connected (by USB!) to a series of old individually unreliable disks. Yet, the idea of having more than one node in a big, new, reliable disk(*) somehow gets under your skin…

In the end it’s still a matter of statistics. 20TB going down is 20TB going down. If the 20TB that goes down is distributed over 20 different nodes (on the same disk) instead of a single 20TB node, it shouldn’t make any difference to the network precisely because data was distributed per IP (not per node!), therefore there is no higher correlation between the data being held on 20 1TB nodes when compared to data being held in a single node in a 20TB disk.

(*) I said “reliable” because I wouldn’t care about the nodes if I had 20 1TB disks holding 20 nodes. But if I had 1 20TB disk holding 20 1TB nodes, you bet I would care for it. I wouldn’t let the disk get too old and I would move the nodes if the disk would start showing signs of being in trouble.