Geofencing and advanced placement-constraint support #4227

Interesting map of a definition of democracy … I thought that the US is the most advanced… but this seems to be off topic. For sure a company located for example in Russia is a good match as a customer for Storj and nobody wants to exclude it as prospect :wink:

Reading these definitions, the “data controller” are Storj users/customers.

I am not sure, the main concern and what this law talks about is the personal data, not the actual storage utilization.

But let’s agree, we are not either lawyers or DPOfficers

From a technical perspective I completely agree. Will this also be interpreted as such from a legal perspective?

Although I am not a lawyer, I deal with data protection laws in my career. the controller (those responsible for the data) choose the “data processors”, and we are one of them, regardless of whether we have a clue. A data controller needs to validate that the data processors are able to abide by the constraints of the law. So it might not be our (Storj network) liability unless we make promises we can’t deliver on. Simply put, EU personal data should stay in the EU unless the right controls are in place.

2 Likes

I was wondering if it is under consideration that a user will be able to change the geo restrictions for data that has already been uploaded. Or will a user be supposed to download and re-upload in such a case? With customers holding Terabytes of data, this could become a real burden easily. Instead a feature similar to repair or graceful exit could be implemented to move the data as background process with respection of the users changed geo fencing requirements.

It has been considered but it not needed for this first MVP. We have ideas how we can improve the system later on. Best would be if customers that need geo restrictions just contact us so that we can discuss with them different solutions. As long as there is not customer requesting it we can only guess and that is usually not the best approach. For now we make sure the current MVP leaves us enough room for improvements later on.

4 Likes

It might not be needed initially, but if considered, then the system could be built in a way that it will be able to add something like that later. I am pretty much convinced that customers would prefer a background service moving pieces over downloading and re-uploading all of their data.

Where on the homepage does it say that to potential customers that you are open to discuss and rewrite the Storj DCS according to customer needs? Hence how shold potential customers know?
I don’t believe this is how it will work. I am afraid that potential customers rather skip Storj DCS if the geo fencing solution does not meet their expectations. And looking at the competitors (Backblaze, AWS, Wasabi etc.) and the regulatory requirements there should be no doubt that a geofencing solution is mandatory for certain industries before Storj DCS would be even considered as storage solution. There should be no question about that.

3 Likes

The constraint should be saved both on bucket level (using as a default for every new segments) and segment level. During segment repair and graceful-exit processes we have access only to the segment information therefore we will save the placement constraint information to the segment table too.

Doesn’t the satellite have the bucket information during segment repair? Is this an limitation of the current implementation or is there a reason that repair and graceful-exit can’t be made aware of geo restrictions through some other mechanism than the segment information?

Too granular and then we need to give a metainformation to the limited audit and repair workers. This will slow down the process significantly.
The audit and repair workers operates with segments, not buckets. The same is going for storagenodes during Graceful Exit - the transfer is happening between nodes, so nodes operates on pieces level, they cannot have an information about buckets. They even do not have an information regarding segments. The satellite will just give it the list of the nodes to where pieces should be transferred.
So the minimal entities for the satellite regarding data (not metadata) are segments.

16 posts were split to a new topic: I wonder how the recent ruling of the Bavarian state court regarding Google Fonts and GDPR affect Storj and this proposal?

I would add that repair workers are even hosted elsewhere and run independent from the satellite core systems.
More info on that here: storj/docs/blueprints/trusted-delegated-repair.md at 778e7e100d637e491f19bf377e8d8a2ebdba3096 · storj/storj · GitHub

Not sure how it would be related to this proposal specifically. But the ruling sure is interesting and I’m not entirely sure why google fonts is singled out here. You’re IP is literally shared with any third party domain you include any type of content from. This includes, images, fonts, videos, embedded frames etc. It seems like the ruling has a much broader scope than just Google fonts, based on the reasoning for it.

So yeah, same here. No reason to single out one specific thing. Based on their arguments this would apply to any resource pulled from a third party domain. (which btw, pretty much impacts any website in existence)

In all of these cases though they ask the satellite to give them limits in order to do the transfers, ultimately isn’t it the satellite that is doing the geofencing (i.e. actually choosing the nodes to store on) as well?

What you’ve done is a denormalization of the placement column to the segments table, instead it would be more general and perhaps useful to denormalize instead the bucket or some other representation of a bucket? What other settings or policies in the future might be set at the bucket level that satellite might need to tie to segments? You’ll have to add a column for each one.

Yes, but you should imagine the massive tables of segment data this requires accessing.
Sets for repair or graceful exit aren’t based on buckets, but on whether segments are below repair threshold or all segments for a specific node independent of buckets. These are large queries to begin with.
Adding bucket information requires joining those massive tables against the bucket table. Which would have a massive impact on performance and resources used. If adding that data to the segment table can avoid that join, it would be well worth it.

I’m guessing objects needing repair and graceful exits are a small subset of all data. You only have to look it up for them. There probably isn’t an index on the segments in the object table though and I don’t think it would be good to add that index to the database directly.

Repair is a daily process though? It means you can easily loop over the objects table and build a one-off segment → bucket datastructure before you run repair. Then use that to do the lookups for the bucket which you can easily use to fetch the bucket policy. Objects just barely stored between when you built the index and when the repair process scans won’t need repair because they just barely got stored.

The question is would that be cheaper and more efficient than storing another column on every segment? Maybe, maybe not.

I also agree that Storj will never be able to comply.
What about SNOs that are routing all the traffic to their nodes through a VPN ?? The IP of the vpn provider could be in a approved country but the actual data live in another.

Maybe Storj could implement something like a written contract with all the SNO’s where they swear not to move the HDs & demand some kind of proof of residence a la KYC? KYS ( know your SNO provider?)

My guess is… Storj is a good project which provides a steady stream of income, and is simple enough and profitable even for beginners.

Storj is growing fast and STORJ to the moon!

1 Like