Question about data resiliency during cataclysmic event

Have a few question about data resiliency on tardigrade:

  1. How decentralized and highly available is the satellite network currently is? Does one satellite actually hosted on geographically diversed datacenter, and has proper high availability setup?
  2. How resilient the data stored at tardigrade is? If a tardigrade user is using Asia satellite, does that mean that data is only distributed amongst SNO in Asia? Or is it distributed to EU / NA and other region as well?

I’m considering a few possibilities of data outage due to cataclysmic event in a certain region, what is usually considered force majeur by 99.9% of storage service provider, for example:

  1. A war breakout in a certain country, and internet is being used for propaganda, forcing the government to shut down the internet for a whole country. Do we have any protection to ensure that user’s data are distributed to many countries?
  2. An earthquake is rupturing ocean fiber line across atlantic, causing network outage between US and EU. US satellite won’t be able to find any data on EU. Will US based tardigrade user have enough local SNO to download the data from?

These are good questions, was wondering about that myself.

To 2.1: Since always the fastest (and typically the geographically closest) nodes win the upload race, how widespread is the data really? If I upload from Germany, will my data also be stored in the US/Asia?

If for some reason EU decides to ban STORJ (highly unlikely, just hypothetical), I want to be able to download my data when I get to the US.

I have thought about that too. Typically you would have to make sure date is being uploaded into every region. I wonder if selection of fastest node is the best decision in terms of redundancy in case of regional failure.

Typically you would have to make sure date is being uploaded into every region

Exactly, like a full slab of data on each continent.

If you want something to be resilient against the failure of an entire continent you’re talking about something that would have this specifically implemented into it. But it would come at a cost.
Right now from what I can tell Storj is not yet built for that kind of redundancy. It’s very well possible that most pieces are stored in the same continent and not enough would be available outside of that continent to recreate the segment. Especially if you consider the repair threshold is set to 35 by default. At that point you need 29 out of those 35 pieces, so it’s highly unlikely you till reach that threshold if either North America or Europe goes down. Ensuring this kind of redundancy means ensuring enough pieces in every continent. This requires higher RS values and a much higher expansion rate of data and tied to that obviously also a higher cost.
This is not to say that this kind of redundancy won’t be possible to add to the network. It’s just not a one implementation fits all kind of thing. Most users probably aren’t interested in spending much more to defend against failure or disconnection of entire continents. But it could definitely be implemented as an option for those who do.

We see ongoing internet isolation from countries like Russia with the ability to disconnect an entire country from the global internet.
So my guess is Tardigrade would not be working in such a country after if they trigger the kill switch? Which would mean not only russian customer would no longer be able to access their uploaded files, also global customer could lose access if their files or parts of it are stored on e.g. russian nodes. How to prevent?

It is relatively easy to prevent from tech side - just:
1 - expand data more (eg 29/150 erasure coding instead if current 29/80)

2 - apply additional IP filter for storage node selection. it already exits, but simple - it just excludes nodes from same /24 ip subnet - 1 node per 1 subnet. Add new rule to this filter based on country (eg no more ~20-30 nodes from same country are selected for storing a file)

3 - adjust file repairer module to track down not only total number of pieces still available on the network, but also per country / continent basis and trigger repair process if it drops below threshold. Or even just simple redistribution (like it done at graceful exit then data transfered directly from one node to another without uploading it to satellite) instead of full repair if total number of pieces is still high enough and it just unevenly distributed across the globe

But it will be cost significantly more. So it can be implemented only as additional pricing plan to whom is want/need so (count such risk seriously) and agree to pay more for additional data durability against political risks and planetary scale cataclysmic events.

But as storage itself (HDD space) costs are relatively low and amount of traffic (which cost a lot) does not increase much. So total end prices for customers still can be affordable and lower compared to current centralized “cloud” storage companies.
And definitely much cheaper compared to storing 2-3 copies of data in 2-3 centralized data-centers from different continents. As centralized cloud are even more vulnerable to such risks and you need to use few different clouds to mitigate it.

For global downloader it would be sufficient if their data is stored in different geographical locations.
I think something like this is a must do be implemented.

For e.g. russian downloaders it seems to be the opposite: It would have to be ensured that all data are stored at least on e.g. russian nodes. I assume this would require e.g. russian satellites that can run independently in case of county disconnection and would resume normal operation when disconnection is over.
Such an implementation sounds very interesting.

I’d say:

  1. Implement GeoIP based node selection, at least fully recoverable slab on 2 different continent.
  2. Restrict how much data slab can stored on one country (again, GeoIP)

Sacrificing a little bit of performance, but much better resiliency.

All of that is only relevant if the satellite survives the event. Otherwise it does not matter if the nodes have the data or not - nobody can access it.

1 Like

That’s true. In case of a governmental shutdown, this could be the case if there is some kind of regional satellite within the closed down boundaries.