Can an attacker store potentially incriminating data unencrypted on a node?

Toyoo · February 26, 2021, 7:35am

Sure, Reed-Solomon would need adaptation, that’s why I wrote about it. Some starting point would be to use a scheme (k, n+k), where the first k pieces are never used. There would be further modifications necessary though.

These would still be unencrypted for transfer though, and if some legal representative knocks on your door and tells you that you were serving illegal content, there will be no time to consult Storj policies and such. Not every place has an equivalent of a safe harbour or DMCA.

Frankly speaking, I’m now considering leaving the network.

jammerdan · February 26, 2021, 7:45am

Yes, disk encryption is just protection on the local disk level.

Alexey · February 26, 2021, 8:27am

How it could protect you from DMCA and co.?
You will have encryption keys.

jammerdan · February 26, 2021, 10:24am

How do you mean? (20chars.)

BrightSilence · February 26, 2021, 8:59pm

If you’re trying to avoid liability the argument: “but I encrypted it!?” Isn’t going to help you… Otherwise all criminals would just encrypt their stuff to avoid liability. Luckily it doesn’t work like that.

jtolio · February 26, 2021, 9:23pm

Hey friends!

So a couple of things on “FrameUp” and the potential impact of illegal data stored on the network, in general.

The FrameUp paper was written analyzing Storj v2. Storj v2 had a very different situation around replication and encryption than v3. Most of the issues FrameUp discusses are not possible with v3 - all data on storage nodes is stored encrypted and then Reed Solomon encoded.

I think this problem is unlikely to be a big deal (especially with the v3 changes) and I’m with @monty in that if someone did want to do this, there are higher impact ways to do it that don’t involve us.

That said, @570RJ and @BrightSilence are right - a modified Uplink could be created to disable encryption and the first k Reed Solomon pieces are just splits of the original data (or the modified Uplink could just avoid Reed Solomon entirely), so it is still possible.

Additionally, none of the countermeasures suggested in the FrameUp paper really work. The FrameUp paper discusses measuring entropy, etc, but all of these styles of countermeasures are defeated if the “incriminator” simply stores illegal data encrypted, but with a widely shared encryption key.

There are other countermeasures the FrameUp authors didn’t propose. One such would work a bit better, but would add more resource usage and load to storage nodes. We could design the system such that storage nodes encrypt data they receive at rest with a random encryption key, and then return the encryption key to the storer and thereafter throw it away. This would mean that all data on the storage node would be encrypted (and not with a widely known key), and the storage node wouldn’t be able to decrypt it. The downside is there is more metadata overhead (the Uplink and Satellite need to keep these keys) and there is more resource usage (a storage node like a Raspberry Pi with unaccelerated symmetric encryption could be impacted more than it already is). If we did this, we’d want to make a very careful encryption scheme selection that weighed the likelihood of this attack against the potential performance impacts.

This solution itself is not without flaws - the incriminator could simply widely share the key the storage node returned. Further, even if that problem were able to be solved, a traffic analysis would still reveal which nodes store which accessed data unless we directed all traffic through Tor, which would definitely be prohibitively performance restricting. I’m not sure there’s a fully general solution to the problem, if the main goal of the network is to store data for others and let them access it how they want.

So, summary, we think the likelihood is low and we don’t view this as a big risk, and the changes to v3 have certainly helped to some degree. There is a small amount more we could do technically, but additional work comes with tradeoffs we would need to evaluate, and there is still no panacea here. To some degree, these problems are endemic to any form of cloud storage, and participants in a distributed or decentralized storage system are no exception.

We recognize there is some risk to storage node operators, and we’ll continue to review our agreements and ensure that we are providing SNOs the greatest legal protection possible. Customers storing data, whether legal or not, is our problem, and should not be SNOs’ problem. We do intend to stand by and support our storage node operators, especially in these ways where older rules collide with newer technology.

jammerdan · February 26, 2021, 9:44pm

It surely depends on the jurisdiction you are living. I can refer to Germany, maybe it is different in the Netherlands.
If you get accused, the accuser has to prove with evidence. If your data is encrypted it is much harder to prove the existence of illegal content if not impossible. Here there exists a privilege against self incrimination which includes that there is no legal way to request an accused subject to reveal passwords to an encrypted storage. So if your disks get confiscated but they are encrypted, the data can not be accessed, so much harder to prove anything.

But for SNOs it makes a huge difference if illegal content is stored unencrypted on their disks or encrypted with n people having the key to it.

I believe this is what it should be and honestly from Storj Labs claims I would have expected this: Data is always encrypted and SNO cannot decrypt it.

Again: This is a different situation for the SNO. The data is encrypted.

How about a Tor opt-in for SNOs who would prefer that?

The problem is, when they come and knock on my door and seize my entire electronics for months or years, it is my problem no matter how much you claim you stand by me.

jtolio · February 26, 2021, 9:57pm

Customer data is always encrypted. A customer has to rip out that encryption deliberately for any of this conversation to matter.

I don’t see this as an offering we will provide soon if ever. It is simply too much of a performance hit. We’re trying to compete with existing cloud providers and are working to be faster than we currently are, not slower. Tor explicitly requests that users don’t do bandwidth intensive operations.

Overall, this whole thread has provided good feedback. We’ll continue to think about this problem. We thought about it during the design of v3. We’ll re-evaluate some of our decisions if that seems prudent.

570RJ · February 26, 2021, 10:06pm

@jtolio how can you claim that when there is nothing enforcing it? Are you saying that you know that none of the customers are using modified client?

jtolio · February 26, 2021, 10:09pm

Okay, you’re right that I should have said “unless explicitly, deliberately removed and disabled, customer data is always encrypted.”

jammerdan · February 26, 2021, 10:14pm

A customer? As far as I understand it is about implementation. Can Storj Labs guarantee that when I am using Filezilla or other 3rd party implementations that my data gets encrypted?

So it rather appears that uplink cannot be trusted by any means.

Maybe you should. The difference is: If illegal content gets served from a Google Cloud an AWS or Azure storage even the dumbest investigator shall realize that it is a cloud storage. This is very much different for SNOs who appear as sole home users with their private IPs and thus much more prone to get investigated individually if there is suspicion of illegal activities.

BrightSilence · February 26, 2021, 10:28pm

I appreciate the in depth response and thought that went into it @jtolio!

While this doesn’t stop SNOs from being an unwilling part it might provide enough legal coverage to avoid liability. If something is stored unencrypted it might be possible to argue SNOs could plausibly know what they store. But it won’t be so easy to argue a SNO could reasonably know that one of the millions of encrypted pieces was encrypted with a widely known key.

I fear a solution to this won’t be either legal, technical or procedural, but rather a combination of the three.

A technical solution to limit the possibility of this occurring, plus a take down procedure should it occur and be detected or should a SNO be notified of illegal data, might just provide enough legal coverage to avoid liability.

As for additional encryption on the SNO side. This doesn’t really seem to solve any issues that the entropy detection wouldn’t solve either. Keys could still be spread and additionally I’m not sure whether you liability doesn’t become an issue again if the data was initially sent to your node unencrypted. “I threw away the key” doesn’t sound like a particularly strong defense. At least not nearly as strong as “I never had the key to begin with”. I also acknowledge the significant downside of additional processing needs on the SNO end. Especially for those without hardware accelerated encryption capabilities. That said, I appreciate that some thought had already gone in to technical solutions that might help.

Realistically thought the only way they would even get to that point is because they already know you’re storing this data. If you yourself don’t even know, the only thread leading to you is the network itself. So they would know based on either the uplink or perhaps link sharing. The proof that your IP is serving that data would be there before they ever knock on your door. At that point: “but encryption” is only an argument if that encryption prevented you from possibly knowing what the data was, which isn’t the case if it’s you who holds the keys.

Pentium100 · February 26, 2021, 10:36pm

Either of those would be better than storing stuff unencrypted and potentially accessible to te SNO. Even if the key is widely accessible, wouldn’t they have to prove that I knew that and then I could have known what’s on my node? Though I am not a lawyer, so I don’t really know.

How about the customer generating a public/private key pair and giving the public key to the node, which then encrypts the data with it. There could even be a database of the keys used and the node would refuse to encrypt a second piece with the same key. Though this would result in even bigger CPU usage, since this would be asymmetric encryption. Though it would make it impossible for the node operator to decrypt the second layer without doing some deliberateactions, like looking for the videly available key and trying to match it with one of the pieces.

I like this idea too. It would make it more difficult to make one of the pieces “plaintext”.

jammerdan · February 26, 2021, 10:43pm

Well they knock on your door to find evidence.
By encrypting the disk they have no access to the evidence. That’s all.
Here it would mean the probability of getting sentenced would decrease dramatically.

BrightSilence · February 26, 2021, 10:45pm

This solution doesn’t really work. It’s easy to work around it and you could just manipulate the encoding so that the later pieces contain the data you want to put on nodes. It requires an extra step, but is not impossible. And it would also be possible to just upload whatever piece you want and not bother with correct reed solomon encoding at all. The data will still remain on nodes, it will just cause failed audits and repairs.

Toyoo · February 27, 2021, 10:51am

Just as I said, there’s more to do here. But it’s Storj’s job to figure it out, not mine. Essentially the protocol would have to introduce two steps:

customer proving the node that what the node is receiving is actually the k-th stripe of a chunk, and nothing else,
the satellite would need to actively seek chunks whose stripes, as stored on nodes, do not compose into a proper chunk, ie. where the customer has uploaded something that did not follow the erasure code used in the network.

I believe both could be implemented at a negligible performance cost.

Also,

might not be exactly necessary. Nodes themselves could act as a overlay network, given that they’re already managed in a way that mirrors management of tor nodes. Obviously at cost though, because SNOs will expect to be paid for that.

Storgeez · February 28, 2021, 10:40am

Data could be doubly encrypted for example, first with customer’s key, then require the customer to encrypt that with Storj’s key prior to sending for storage. Encrypt it in such a way that the encryption of individual pieces can be verified, Storj keys are sent to all nodes and, upon receipt, the nodes verify the encryption is present and discard the key.
I don’t know if there is a way to encypt the data while enabling a single piece to be decrypted on its own, if not, pieces would need to be encrypted individually.

This results in guaranteed encyptedness of stored SN data. More complex though. And the attacker can still send unencrypted incriminating data to the node, but the data would never end up stored because the node would discard it as invalid.

This would be very expensive, each piece would need to be downloaded, it would, at this point, make sense to upload the data to the satellite itself rather than directly to nodes, which defeats the idea of Storj.

Toyoo · February 28, 2021, 1:51pm

Only as expensive as the usual audit procedure.

Storgeez · March 1, 2021, 3:50pm

The usual audit procedure uses a few megabytes of traffic a month…

Pac · March 1, 2021, 7:17pm

I’m wondering something: Currently, if a SNO ends up with some illegal unencrypted files on its node storage disk, they could simply be accused to have put these illegal files there to camouflage them in the sea of proper STORJ files…

How would an SNO prove that these files do come from the Tardigrade Network? Do satellites keep a history for some time (let’s say for the past 12 months for instance) of all files (i.e. metadata only) that were uploaded to SNOs’ nodes at some point? So StorjLabs could provide evidence that the file came from the Tardigrade network even if at the time of the check by the authorities it could have been deleted from the Tardigrade network?

EDIT: Seems like storing non-STORJ files wouldn’t such of a problem in the end: they would automatically get deleted by the garbage collection (as reminded by @BrightSilence further below)