Anonymized reads and writes

rhodey · November 20, 2024, 3:07am

imagine I have a special application running in a secure compute environment and the application requires persistent storage. the application has encryption keys so it’s not a problem to encrypt everything before writing

however this application cannot use amazon s3 because the developer who runs the application wants to be able to tell the users that the dev cannot spy on the application including being able to enable s3 server access logs and to infer things about the application state by spying on s3 bucket access patterns

what I would like is a Storj bucket option which says “never allow me to turn on logging features for this bucket”

I understand that the storage nodes in the network if they all got together and coordinated they could possibly understand a thing or two about the access patterns but this is not totally unacceptable condition and maybe something could be done to prevent that too

rhodey · November 20, 2024, 3:19am

some more thoughts & questions

if I have a file and it gets split into parts and sent around to multiple storage nodes for redundancy can these storage nodes prove to each other they have one or more of the same parts?

when I ask for object in bucketA with pathB that request or at least parts of it end up reaching multiple storage nodes so they can return the data, can these storage nodes prove to each other that they received requests for the same pathB?

nerdatwork · November 20, 2024, 3:22am

You should read the documentation by the time someone answers your specific questions.

rhodey · November 20, 2024, 3:23am

I think maybe the storage nodes don’t have enough information to put the anonymous wanting app at risk but I think it makes sense that the network coordinators (satellites?) probably have enough information / see enough request patterns to maybe learn some things

rhodey · November 20, 2024, 3:27am

Hi — OK I have been reading the docs

One thing I know for sure is there is no flag when you create a bucket which says “disable bucket metrics forever” and so this is something which I know I have to communicate if I want it to happen

As for information about how much nodes know and how much the satellites know like I said I have been reading, the docs actually sent me here, I will be reading more and maybe find the answer in a few days, or maybe someone likes Storj and likes sharing what they know and someone like this will write me back and give me more confidence in whatever information turns up

Alexey · November 20, 2024, 4:03am

Our implementation of S3 compatible protocol doesn’t support this feature, so, it’s disabled by default and cannot be enabled via S3 request. We also do not log anything by default, for the native Storj protocol it’s even not possible.
However, the logging for GatewayMT can be enabled by a support request and we wouldn’t have an access to these logs, because you will use your own access grant for the logging. The logs would be stored in your bucket and only you would have an access to them. You may generate a share link to that bucket and give it to someone of course, but it will not be generated automatically and cannot be discovered even if it’s generated, you must explicitly give that link yourself.

I do not think so, especially in the case if you use an S3 compatible GatewayMT - all requests to the nodes would come from the distributed GatewayMT instances, so really not possible to identify neither a pattern nor to group by a potential customer even if the nodes owners would coordinate their spy activity.
The node doesn’t have any information about pieces, customers, buckets, etc. and content. Everything is encrypted by the customers’ encryption keys. Each node has only 1 of required pieces (e.g. 1 out of 80) of the one segment of the file. So even for a Storj native protocol where the customer contacts directly all needed nodes wouldn’t help to identify the pattern or to group requests by the customer. They may get only IPs.

Why do you want to re-implement the Storj native protocol instead of just using it?
See:

No they are not. Nodes have a zero knowledge about a content, including paths, buckets, customers, whatever.
The same is true for the satellite, the customers’ data is encrypted by their encryption keys.
Even GatewayMT has no way to view this information, despite the fact that you store your access grant there, because it is also encrypted. And when you access GatewayMT, you provide an Access Key each time to decrypt the access grant, which then makes your data available. For public access, only the Access Key is enough, but for private access, you also need the Access Secret Key.
See

rhodey · November 21, 2024, 2:41am

Hey @Alexey !

Thanks for writing back!

I did not mean to imply that I would be breaking the file into parts myself I just meant to elude to that’s what I learned Storj does so to include it in my formulation

All the information you provide points to Storj being a good fit for an application I have in mind I will review these docs more deeply

Can you tell me though is the original Storj white paper a good reference and also up-to-date enough? I have found the documentation quality for Storj is actually very good but my feedback is the information is maybe in too many places so I don’t feel like I’m reading things in the best order

Hopefully the white paper is still relevant I will try that

Thanks!

Alexey · November 21, 2024, 5:54am

Of course! We described In our docs a current implementation of the protocol accordingly the whitepaper.
So for the practical usage it’s advisable to follow our guides on https://storj.dev/
For the concepts you can find the information there too, or read the whitepaper (it’s more detailed in math and how things should work). For a brief overview I may suggest to read our blogs: