Geofencing and advanced placement-constraint support #4227

elek · October 19, 2021, 9:26am

Please note that a new blueprint has been published to support geofencing:

github.com/storj/storj

Geofencing and advanced placement-constraint support

storj:main ← elek:geofencing

opened 09:21AM - 19 Oct 21 UTC

elek

+100 -0

**Created together with @mjpitz and @kaloyan-raev.** This pull request adds… a new blueprint document that introduces a new capability to add placement constraints (especially geofencing) for node allocations. TLDR; * new field in `nodes` table with country code * placement constraint (`int(2)`) in `bucket_metainfos` and `segment` table * used during segment creation / repair / graceful exit * countries are identified by geoip database As discussed earlier in #694

All comments are welcome.

jammerdan · October 19, 2021, 10:08am

This sounds very interesting and is very much needed.

Depending on the implementation, my main suggestion would be to make different selections available for the customer:

Selection by regulation he has to comply to (e.g. GDPR): The idea would be that all he has to do is to check a checkbox ‘GDPR’ in the GUI and region/country selection is made automatically for him.
Selection by country: We saw it in the thread regarding the french health system that there are cases where specifying e.g. EU as region is not enough and country specific storage is required. I can also tell from the German perspective that for certain use cases like the entire public sector storage in Germany would be more appealing or even required.
Selection by broader regions (maybe with option to exclude specific countries where you don’t want to have your data stored under any circumstances. I hate to say it, but China, Russia, Iran, Turkey could be such countries from a German perspective.)

Anyways I believe such a feature could justify a premium if a customer enables it for a bucket.

elek · October 19, 2021, 10:36am

Thanks the feedback @jammerdan

One of the biggest challenges in the current design is storing the constraint. We need to store it in the segments table where we can have millions of segments.

Current proposal suggests to use two bytes to store the constraint. It should be enough to define 65536 different constraint:

To store data per countries
To not store data in one specific country
To define multiple regions (even with negated countries)

Node table will store the country code of the storagenodes (IPv4 based best-effort, for now)

Which means that all of your mentioned use case will be possible with the code modifications. (However, first implementation may include only regional constraint for EU/US, but others also can be implemented easily…)

kaloyan · October 19, 2021, 12:15pm

It is worth noting that initially we don’t plan any user interface for configuring geofencing. It would be only possible via a support ticket. Once we learn more about what our customers really need, we will look at designing a proper user interface.

nerdatwork · October 19, 2021, 12:51pm

I am linking a few posts where customers/users have shown interest in such feature.

BrightSilence · October 19, 2021, 1:46pm

There are 2^195 = 5.0216814e+58 different possible lists of countries. I think you’re going to have to make some choices on what to support. Regulatory regions would make sense I guess.

Toyoo · October 19, 2021, 5:03pm

I don’t think Storj, as designed today, will ever be able to comply with geofencing laws. Ever. For one reason: SNOs are free to move their nodes between countries, and Storj has zero leverage on stopping a hard disk from being moved abroad.

What it will allow, though, would be a priori selecting nodes that will be close to downloading users (as opposed to being close/fast for the uploader). So, probably still useful, but nowhere near the legal constraints required for HIPAA et al.

Hence a 16 bit field might be enough to act as a bitmask for 16 world regions without splitting hairs whether a given piece is to be stored in Liechtenstein or Monaco. Maybe even only act as a expression of soft preferences, as opposed to hard constraints, ie. prefer nodes from a given region, but select other nodes if there’s not enough in a given region.

hashbackup · October 19, 2021, 5:53pm

On an IP address change, it seems easy enough to delete geo-restricted pieces and flag the segment for repair if necessary.

IMO geo-restrictions on E2E encrypted data show how out of touch lawmakers are with computer technology, especially when hackers can break in to the original source computers where data is stored in the “right” country, unencrypted.

Toyoo · October 19, 2021, 6:27pm

That’s too late for the laws in question.

Skyblockpro1 · October 19, 2021, 6:49pm

Ye honestly with this there could be some very cool things implemented. EG some people never want their files to be placed in certain countries for political reasons. I mean with tow bytes the sky is the limit cannot wait what will come out of this but I can sense it will be good

hashbackup · October 19, 2021, 7:11pm

If an EU person uploads a file with their name and address, and the file is encrypted before upload, does the storage target possess the EU person’s protected data? I’d say no.

Storing E2E encrypted data is different than using s3cmd to upload a plaintext file to S3. If an EU resident tried to claim that Storj or a storage node operator transferred their data illegally according to the GDPR, I’d give them a copy of the storage node disk in question and tell them to show me what personal data of theirs is stored on that disk.

Before implementing geofencing, IMO, Storj should get written letters from customers saying this is the only thing blocking them from using Storj. A benefit of doing that is Storj can have conversations with actual users about how geofencing works, the safeguards taken to ensure pieces are not made available online outside the original geofence, etc.

Pentium100 · October 19, 2021, 7:25pm

It depends on how the law sees it and the law probably does not have an exception for encrypted data.

BrightSilence · October 19, 2021, 7:43pm

EU law is notoriously unspecific. I’m almost certain it simply isn’t addressed. Most of it is based on legal precedent. Without that, nobody knows how to interpret the law. It doesn’t help that every member country has their own implementation of the EU guidelines. These laws are quite new, so the precedent usually doesn’t exist yet. Brave new world and all…

There really need to be better laws that understand zero trust architectures.

Toyoo · October 19, 2021, 9:11pm

The Storj protocol doesn’t exactly enforce encryption. The uplink client does, but it’s just client-side code. Nothing stops you from just getting rid of the encryption part.

If this was that easy, then any storage provider would just say “do E2E and you’ll be compliant”. Yet somehow this is not enough.

KernelPanick · October 20, 2021, 2:46am

The big names are all USA FedRamp Authorized. Amazon s3, MS Azure. This is a good step toward compliance.

SA-9(8) External System Services | Processing and Storage Location — U.S. Jurisdiction Restrict the geographic location of information processing and data storage to facilities located within in the legal jurisdictional boundary of the United States. The geographic location of information processing and data storage can have a direct impact on the ability of organizations to successfully execute their mission and business functions. A compromise or breach of high impact information and systems can have severe or catastrophic adverse impacts on organizational assets and operations, individuals, other organizations, and the Nation. Restricting the processing and storage of high-impact information to facilities within the legal jurisdictional boundary of the United States provides greater control over such processing and storage. SA-5, SR-4.

https://csrc.nist.gov/CSRC/media/Publications/sp/800-53/rev-5/final/documents/sp800-53r5-control-catalog.xlsx

elek · October 20, 2021, 8:18am

Exactly. We can support any individual countries (however, it’s reasonable only for countries with enough nodes) and pre-defined regions. I don’t think we should support hand-picked selection of countries (which would require 2^195 bits, as you wrote)

elek · October 20, 2021, 8:28am

Thanks for all the feedback. I think the summary so far is the following:

It’s a good idea to support more fine-grained placement policies.
The geoip based geofencing may not be enough for hard legal requirements.

I can agree with both. And for this reason the design has two important parts:

Extending the code base to make it possible to implement different type of placement policies / constraints.
Implement a simple GeoIP based constraint for EU/EEA/US.

The (1) will make it possible to support multiple different placement policies. Not just strict geofencing, but – for example – the suggested “use as many countries as possible” rule.

Using this API, we can continuously evaluate and improve this feature.

(2) may not be enough legally in some cases, but it’s a starting point, and we can improve it over the time as we will have the API in place…

((There are also open questions which should be addressed later, like moving the data in case of IP change or IPv6 support. ))

readonly · October 20, 2021, 10:06am

Hi folks,

In order to make the discussion meaningful in the EU context, please, someone from the storj team to comment how they identify the Storj company, as well as the other stakeholders (SNOs, customers which are utilizing the satellites in order to reach to the SNO provided network)? There are two main actors that has to comply and it is really the essence of the law: Data Controller and Data Processor… period.

You should agree with your consultants, which party falls in which category.
Yes, in the US I know there is another wall to hit, the ability to provide the data upon request for investigation etc… but it is close, as if you cannot technically do it, this becomes nonsense. Sometimes it is worth to walk on the edge, instead of getting back to the stone age.

A lot to be said, but first thing as mentioned shall be crystal clear… otherwise, just implementing something like the hyperscale’s datacenter approach limits the whole idea… we are now in the gave of the Internet vs. Intranet (latter is kind of controlled, centralized distributed system)

Cheers!
feel free to dm and get in more deep discussion… as I know a bit around that

kaloyan · October 20, 2021, 7:08pm

Reading these definitions, the “data controller” are Storj users/customers.

The satellite operator (e.g. Storj Labs) falls in the definition of “data processor”, but their access to user’s data is limited because the data is encrypted. However, some very generic data points could be extracted as the total size of data uploaded, number of files, etc.

SNOs should be neither data controllers, nor data processors as they have no clue about the data of specific users.

kaloyan · October 20, 2021, 7:15pm

Here is another perspective on this feature.

The Internet Archive mentioned on the latest DWeb Meetup that they are looking to store their data only in countries with high democracy index. Their motivation is based on values instead of on regulatory compliance.