Do I need a raid or mirrored array

Lets translate that risk into average costs and compare.

Let me start by linking to a source for annualized failure rates as I don’t want to be accused of pulling numbers out of my ***.


They have an average of 1.8%, but lets go with 2% to make it easier.

The first year you’ll have indeed 3 months of 25% payout, 3 months of 50%, 3 months of 75% and 3 months of 100%. But lets say that on average during the first year you have a loss of 50% of earnings if your node fails. It’ll be more if it fails earlier in the year, less if it fails later in the year. During the next year 50% of the amount held in escrow is paid back to you, so that percentage drops to 25% of the income during the first year.

Scenario 1: 1 node on RAID5

Lets assume perfect protection in a RAID5 setup and a 0% failure rate. This setup can provide 2/3rds of storage capacity to the network and gets the theoretical 100% of possible income on that 2/3rds.

Scenario 2: 3 nodes for 3 disks

2% (failure risk) * 50% (loss when node fails) = 1% loss of total income on average in the first year. Because of partial payout of escrow that risk drops to a loss of 0.5% of the income this node made in the first year per year.
Of course if you do lose a node, you have to collect new data again. It’s going to be hard to predict how long that will take in a production scenario. But lets assume it will take 2 years to get back up to the same level and grows evenly over those 2 years. This means over the first 2 years you basically use a year worth of income for the failed node. This cost can be expressed as 2% (chance of failure) * 100% (loss when node fails) = 2%. If this failure happens during the first 2 years the loss is of course lower, but lets ignore that. The total loss for escrow as well as having to collect new data comes down to about 3% on average per year.
The upside is that you are able to share 100% of your drive capacity with the network.

Conclusion

Scenario 1 is indeed the better option if you already have the hardware laying around and you never get more data sent to your node than 2/3rds of the capacity you have available. About 3% more profitable to be exact.
As soon as the network sends your nodes more data than that 2/3rds size (or to be more exact 103% of that 2/3rds size) scenario 2 becomes more profitable. How much more profitable? Between 0-30% more profitable depending on how much of your storage will be filled. 30% when all space is used.
I’ll take the tiny risk of node loss now over the certainty of significant loss of income long term when I simply share less space.

Of course this equation may change if you have more HDD space laying around than the network can reasonably ever fill up. Then by all means use RAID (and avoid RAID5 even as mentioned by @Alexey).

All this assumes you already have 3 HDDs laying around. If you have to justify spending more money to buy that third drive it becomes even harder to argue for a RAID setup because recouping the cost of that additional HDD with the average 3% more income would probably take more than a lifetime.

I think I have been extremely reasonable in this assessment, using higher failure rates than backblaze measured, assuming perfect protection from RAID5, using very long recovery rates for collecting new data on new nodes and still I can’t find good reasons to go with RAID over single disk nodes unless you already have a large amount of large drives laying around. If you disagree, please point out specifically where my calculation is incorrect.

4 Likes