Sharing ML Dataset via read-only access grant

Hi Storj team. I’m working on a project right now where I would like to host a large ML dataset in Storj DCS and share access with a big team of collaborators (~100 people). We will be using rclone for file transfers. I plan to create an access grant with read/list permissions only for this dataset bucket and basically paste the access grant in our project README.md.

My question is, is there anything inherently wrong or insecure with this sharing strategy? The dataset isn’t proprietary and I don’t care if the access grant is shared with people outside the research group. Apologies if this is a simplistic question, but I figured this was the best place to bounce the idea around. Thanks for any feedback.

This is completely normal if you don’t mind to share it with more wide group of people (I assume that your README.md is publicly accessible).
The access grants designed to protect your original (root) macaroon - all generated are derived ones.
Moreover - you always can revoke an access.

However, I’m glad that you asked - I would suggest to inspect this access grant to make sure that it’s not include your root encryption key.

uplink access inspect <your access grant here or name of the access>
3 Likes

We have two access types - we have access grants and access keys. You can use uplink to register an access grant to become an access key.

If you are going to share credentials publicly, I recommend using an access key and not an access grant. Even though an access key can’t be used with libuplink, an access grant has one significant flaw that we have yet to resolve.

If you share an access grant in a way that a malicious storage node operator can observe, then that storage node operator can use your access grant with a modified storage node and uplink and have both sides of the operation collude to lie and send your bill sky high. Essentially, a malicious storage node operator that has an access grant that points to data that lives on their storage node has the ability to modify an uplink to claim that huge amounts of data have been requested, without ever actually using any. In this scenario, the storage node gets paid through your project’s bill.

An access key can be used with hosted services such as link.<region>.storjshare.io or gateway.<region>.storjshare.io, but cannot be used with a libuplink directly. When you run uplink share --url, that command implicitly creates the kind of access key that you’re looking for (readonly, public, etc) and it should be safe to share without worrying about this malicious attack. You can share folders and files this way. rclone, through the s3 integration, should also be able to contact a gateway using an access key like this.

This is a temporary solution to this collusion problem, and we’re still thinking about a better way to detect and prevent it.

Anyway, summary: please use an access key and don’t share your access grants.

6 Likes

Hello Storj experts! I found this thread because I have a very similar use case in mind, with the difference that I would like to do it programatically (create a safe, read-only access key or just a download link for others). Any hints how to do that?

Hello @wiku ,
Welcome to the forum!

Have you tried our bindings?

and also Storj - Third Party · GitHub

Hi @Alexey,

Yes, I’m playing a bit with the java binding, compiled with latest uplink-c, at least the basic functionality I’ve tried works fine. I already know I can do something like this:

        Access fullAccess = Access.parse("...");
        Access limitedAccess = fullAccess.share(new Permission.Builder().allowDownload().build(), new SharePrefix("test"));

but is it safe? I was just wondering about the potential security issue mentioned above:

a malicious storage node operator that has an access grant that points to data that lives on their storage node has the ability to modify an uplink to claim that huge amounts of data have been requested

How to avoid it?

Hello @wiku,

To avoid malicious storage node operator that has an access grant that points to data that lives on their storage node to claim huge amounts of data have been requested you have to use access key instead of access grant. Creating restricted read only access grant which is then registered with an auth service such as https://auth.us1.storjshare.io to get a corresponding access key.

Implementation reference: storj/cmd/uplink/cmd/share.go at 9153b221fdcaa208bd2d55d16b4e0e73150026c7 · storj/storj · GitHub

1 Like

Hi,

Ok, I finally managed to find some time to look at it and try something, and it worked. The missing piece was registering, which can be done by sending a POST request to https://auth.eu1.storjshare.io/v1/access:

{
	"access_grant": "<your_limited_access_grant>",
	"public": true
}

It returns the access key, secret key and gateway endpoint. The key can be later used to build a URL like this (a bit reversed engineered based on what the Web GUI returns, but it works fine):


https://link.eu1.storjshare.io/s/<access_key>/<bucket>/<file>?download

I guess the exact API might be subject to change in the future but at least that’s the current version which works for me. I assume there’s also a way to unregister/revoke access using the secret key, I’ll look into that later.

So overall the functionality looks very promising. Thank you @rikysya and @Alexey, I really appreciate your help!

1 Like