Geofencing and advanced placement-constraint support #4227

Please note that a new blueprint has been published to support geofencing:

All comments are welcome.

2 Likes

This sounds very interesting and is very much needed.

Depending on the implementation, my main suggestion would be to make different selections available for the customer:

  • Selection by regulation he has to comply to (e.g. GDPR): The idea would be that all he has to do is to check a checkbox ‘GDPR’ in the GUI and region/country selection is made automatically for him.
  • Selection by country: We saw it in the thread regarding the french health system that there are cases where specifying e.g. EU as region is not enough and country specific storage is required. I can also tell from the German perspective that for certain use cases like the entire public sector storage in Germany would be more appealing or even required.
  • Selection by broader regions (maybe with option to exclude specific countries where you don’t want to have your data stored under any circumstances. I hate to say it, but China, Russia, Iran, Turkey could be such countries from a German perspective.)

Anyways I believe such a feature could justify a premium if a customer enables it for a bucket.

3 Likes

Thanks the feedback @jammerdan

One of the biggest challenges in the current design is storing the constraint. We need to store it in the segments table where we can have millions of segments.

Current proposal suggests to use two bytes to store the constraint. It should be enough to define 65536 different constraint:

  1. To store data per countries
  2. To not store data in one specific country
  3. To define multiple regions (even with negated countries)

Node table will store the country code of the storagenodes (IPv4 based best-effort, for now)

Which means that all of your mentioned use case will be possible with the code modifications. (However, first implementation may include only regional constraint for EU/US, but others also can be implemented easily…)

1 Like

It is worth noting that initially we don’t plan any user interface for configuring geofencing. It would be only possible via a support ticket. Once we learn more about what our customers really need, we will look at designing a proper user interface.

3 Likes

I am linking a few posts where customers/users have shown interest in such feature.

1 Like

There are 2^195 = 5.0216814e+58 different possible lists of countries. I think you’re going to have to make some choices on what to support. :wink: Regulatory regions would make sense I guess.

A typical case is when data should be placed in a certain geographic region (like US or EU) due to legal requirements.

This is really exciting, and very long overdue ! do you have planned timescales to when this legal compliance will be in place for Storj ? It will really help with clients onboarding I think.

Also, was thinking that as you will know where storj people are, could there be option for client to select their upload is shared over unique countries when node selection happens - so instead of random, its random over all countries so that a single piece is not all stored on multiple nodes in same country like France or Germany who statistically have more nodes than India from http://storjnet.info/ ?

1 Like

I don’t think Storj, as designed today, will ever be able to comply with geofencing laws. Ever. For one reason: SNOs are free to move their nodes between countries, and Storj has zero leverage on stopping a hard disk from being moved abroad.

What it will allow, though, would be a priori selecting nodes that will be close to downloading users (as opposed to being close/fast for the uploader). So, probably still useful, but nowhere near the legal constraints required for HIPAA et al.

Hence a 16 bit field might be enough to act as a bitmask for 16 world regions without splitting hairs whether a given piece is to be stored in Liechtenstein or Monaco. Maybe even only act as a expression of soft preferences, as opposed to hard constraints, ie. prefer nodes from a given region, but select other nodes if there’s not enough in a given region.

On an IP address change, it seems easy enough to delete geo-restricted pieces and flag the segment for repair if necessary.

IMO geo-restrictions on E2E encrypted data show how out of touch lawmakers are with computer technology, especially when hackers can break in to the original source computers where data is stored in the “right” country, unencrypted.

1 Like

That’s too late for the laws in question.

Ye honestly with this there could be some very cool things implemented. EG some people never want their files to be placed in certain countries for political reasons. I mean with tow bytes the sky is the limit cannot wait what will come out of this but I can sense it will be good

If an EU person uploads a file with their name and address, and the file is encrypted before upload, does the storage target possess the EU person’s protected data? I’d say no.

Storing E2E encrypted data is different than using s3cmd to upload a plaintext file to S3. If an EU resident tried to claim that Storj or a storage node operator transferred their data illegally according to the GDPR, I’d give them a copy of the storage node disk in question and tell them to show me what personal data of theirs is stored on that disk.

Before implementing geofencing, IMO, Storj should get written letters from customers saying this is the only thing blocking them from using Storj. A benefit of doing that is Storj can have conversations with actual users about how geofencing works, the safeguards taken to ensure pieces are not made available online outside the original geofence, etc.

It depends on how the law sees it and the law probably does not have an exception for encrypted data.

EU law is notoriously unspecific. I’m almost certain it simply isn’t addressed. Most of it is based on legal precedent. Without that, nobody knows how to interpret the law. It doesn’t help that every member country has their own implementation of the EU guidelines. These laws are quite new, so the precedent usually doesn’t exist yet. Brave new world and all…

There really need to be better laws that understand zero trust architectures.

The Storj protocol doesn’t exactly enforce encryption. The uplink client does, but it’s just client-side code. Nothing stops you from just getting rid of the encryption part.

If this was that easy, then any storage provider would just say “do E2E and you’ll be compliant”. Yet somehow this is not enough.

The big names are all USA FedRamp Authorized. Amazon s3, MS Azure. This is a good step toward compliance.

SA-9(8) External System Services | Processing and Storage Location — U.S. Jurisdiction Restrict the geographic location of information processing and data storage to facilities located within in the legal jurisdictional boundary of the United States. The geographic location of information processing and data storage can have a direct impact on the ability of organizations to successfully execute their mission and business functions. A compromise or breach of high impact information and systems can have severe or catastrophic adverse impacts on organizational assets and operations, individuals, other organizations, and the Nation. Restricting the processing and storage of high-impact information to facilities within the legal jurisdictional boundary of the United States provides greater control over such processing and storage. SA-5, SR-4.

https://csrc.nist.gov/CSRC/media/Publications/sp/800-53/rev-5/final/documents/sp800-53r5-control-catalog.xlsx

Exactly. We can support any individual countries (however, it’s reasonable only for countries with enough nodes) and pre-defined regions. I don’t think we should support hand-picked selection of countries (which would require 2^195 bits, as you wrote)

Thanks for all the feedback. I think the summary so far is the following:

  1. It’s a good idea to support more fine-grained placement policies.
  2. The geoip based geofencing may not be enough for hard legal requirements.

I can agree with both. And for this reason the design has two important parts:

  1. Extending the code base to make it possible to implement different type of placement policies / constraints.
  2. Implement a simple GeoIP based constraint for EU/EEA/US.

The (1) will make it possible to support multiple different placement policies. Not just strict geofencing, but – for example – the suggested “use as many countries as possible” rule.

Using this API, we can continuously evaluate and improve this feature.

(2) may not be enough legally in some cases, but it’s a starting point, and we can improve it over the time as we will have the API in place…

((There are also open questions which should be addressed later, like moving the data in case of IP change or IPv6 support. ))

3 Likes

Hi folks,

In order to make the discussion meaningful in the EU context, please, someone from the storj team to comment how they identify the Storj company, as well as the other stakeholders (SNOs, customers which are utilizing the satellites in order to reach to the SNO provided network)? There are two main actors that has to comply and it is really the essence of the law: Data Controller and Data Processor… period.

You should agree with your consultants, which party falls in which category.
Yes, in the US I know there is another wall to hit, the ability to provide the data upon request for investigation etc… but it is close, as if you cannot technically do it, this becomes nonsense. Sometimes it is worth to walk on the edge, instead of getting back to the stone age.

A lot to be said, but first thing as mentioned shall be crystal clear… otherwise, just implementing something like the hyperscale’s datacenter approach limits the whole idea… we are now in the gave of the Internet vs. Intranet (latter is kind of controlled, centralized distributed system)

Cheers!
feel free to dm and get in more deep discussion… as I know a bit around that

Reading these definitions, the “data controller” are Storj users/customers.

The satellite operator (e.g. Storj Labs) falls in the definition of “data processor”, but their access to user’s data is limited because the data is encrypted. However, some very generic data points could be extracted as the total size of data uploaded, number of files, etc.

SNOs should be neither data controllers, nor data processors as they have no clue about the data of specific users.