Do you want to consider a catastrophic failure or a deliberate action? Even if a lot of SNOs are angry, there is an incentive for many of them to first send data to other SNOs before shutting down their nodes: graceful exit, which pays money. This is currently available for nodes that are at least 6 months old, and one could argue that younger nodes just don’t have enough data to matter (though I guess only Storj has statistics to confirm).
In case of a catastrophic failure, you can start by consulting section 7.3 of the Storj Whitepaper. It contains some math and simulations that attempt to answer your question: the parameters for the network were initially selected to have safety better than what competition offers. There are two caveats when reading the paper, one working in our advantage, and two against:
- The simulations assume the smallest time period is a month, but it’s a good thing: this means that they assume the soonest Storj can take action is the next month, and obviously in practice Storj can take action much more often.
- The simulations talk about single chunks. Currently a single chunk is about 2.5MB, and a single file might be composed of many chunks. Loss of any chunk is essentially a loss of the whole file. Hence if you compute the probability of losing a single chunk to be
X (let say, 10¯²⁰), the probability of losing a file consisting of 42 chunks will be
1-(1-X)⁴² (in this case about 4×10¯¹⁸). The larger the file, the bigger this effect will become.
- The math doesn’t take correlation of failures into account:
Consider a case where two nodes initially hosted separately are now migrated to single hardware. For any chunk that happened to have its stripes hosted on those two nodes, before the migration a failure might take down one stripe at a time, and now it will take both. This is pretty much equivalent to a loss of one stripe. It would be possible to simply repair the chunks affected by this problem in a regular way, but so far we have no word from Storj that this actually happens, so for the purpose of risk estimation we must assume it doesn’t.
Given a large-enough number of nodes, this is not a problem though, again because of all the redundancy already in the network: it is very unlikely that a given chunk will be affected by a migration like that more than once. I believe that Storj for a long time now has more than enough nodes to not be affected by this problem. However, only satellite operators, that is currently only Storj employees, can measure the actual correlation in the network.
One last thing is that in case of a large-scale failure, raw capacity is very easy and cheap to rebuild on top of regular cloud storage, and the current tooling already allows that. It won’t be as massively redundant as a proper SNO-based network, but it will be enough to provide temporary means to survive the failure.