A service, professional SNOs could provide to customers who wants to transfer huge amounts of data onto Tardigrade

jammerdan · December 22, 2020, 10:33pm

@jocelyn: And here is my idea from the other thread…

When sourcing some links for my digital cinema thread, I stumbled over this Amazon link and thought what an interesting idea.

If Storj really manages to attract customers with huge data volumes to transfer onto the Tardigrade platform, professional SNOs with huge bandwidth (like data centers) could become upload provider for Tardigrade.
Basic idea is: New customer with let’s say like hundreds of terabytes of data receives data HDDs from the professional SNO. Puts his data on the disks and send the disks to the data center. The data center takes care of the upload with its huge bandwidth it can provide.
I really don’t know if this makes sense, but the idea behind it would be that the professional SNO has a better internet connection and can transfer customers data onto Tardigrade much much faster. For this service, the professional SNO could charge the customer extra.

kevink · December 23, 2020, 6:30am

What about security, reliability and privacy issues with that approach?

Backblaze offers something like that but would you trust some SNO with it? storjlabs maybe but not an SNO. How can I be sure everything on my drive is properly uploaded? (even if I encrypted all data before so there is no privacy issue). And how can it be uploaded? you need an encryption passphrase to upload the data. Would you give that to an SNO? Or storjlabs itself? idk… I see the benefit in the idea of getting lots of data onto tardigrade fast but the context seems difficult.

Pentium100 · December 23, 2020, 6:37am

Providing a service of “uploading data to some online storage” and running a Storj node do not depend on each other. at all.

If someone has a lot of data that he wants to upload to an online storage service he could probably find someone to do that - whether that person also runs a Storj node does not matter at all.

jammerdan · December 23, 2020, 7:25am

Let me first give you this link that has shown that Storjlabs has at least thought about so called commercial SNOs or data center partners in the past: Commercial SNO & Data Center Partners vs Decentralized storage

True ‘some’ SNO might not be trusted, but ‘some’ SNO really would not have the ability or resources to fulfill all requirements for such upload tasks. Including trust. Would I trust some home SNO? No, not at all. But would I trust a reputable data center in my country that happens to be a Tardigrade uploader, something like (just throwing names here) Hetzner, OVH or at least something that I can google. Yes, I guess this could be achieved.

I think that’s all merely technical questions. I don’t have the right answers for that. But I was thinking of hardware encryption for the HDDs, uploading with public/private key encryption, so that a customers passwords don’t have to be revealed.
Of course Storj could help by creating container-like import formats and technical workflows for automation. Maybe something like this:

Customer receives ‘special’ HDDs with hardware encryption and a software tool where he can move his data into it.
Software tool makes sure data is correctly copied and creates an encrypted container for it.
Customer ships HDDs to Storjlabs approved data center.
Data center loads the container into their tool and uploads the data. The tool takes care that uploaded data is correct while maintaining encryption.
If everything successfull, HDDs data gets securely wiped from the HDDs.

And to make sure that the data center meets all requirements, it could become something like “Storjlabs or Tardigrade approved”.
Again: I have no idea about this could be achieved in terms of technical detail. Do you have details how Amazon or Backblaze are doing this?

Edit: I was just thinking about the digital cinema, and they have this approach with DCP container files and KMS files to unlock encryption. Maybe something like this could work in such an upload scenario as well, so that without a customer provided ‘kms’ file, the date on the disk remain unusable even for the upload data center.

Yes that’s true. But I was thinking from the other side: About additional services a commercial or data center SNO could provide as Storjlabs has mentioned such in the past. Those could be a preferred choice for such upload tasks. They could become something like “Storj approved” if they follow certain procedures and fulfill all requirements and also being listed on the Tardigrade website and stuff like that. So a customer could be sure that this service works seamlessly with Tardigrade (maybe has even special tools provided by Storjlabs) and meet all other requirements, like bandwidth, CPU power, privacy, security etc. etc.

tylkomat · December 23, 2020, 8:15am

Google also provides such a service for their cloud

jammerdan · December 23, 2020, 8:44am

I like this comparison chart:

Just imagine customers like from that linked article, that said for a single feature move, the overall materials data size produced is around 350 TB.

Alexey · December 23, 2020, 9:54am

Not so easy, we do not keep customers’ keys in the Tardigrade unlike Google. So, the encryption right now is client-side not the server-side like in Google. Thus you need to provide your encryption phrase to that uploader or generate an Access grant with expiration.
It doesn’t look secure anyway, because you need to have trust to uploader and believe that they do not tamper or sell your data in between.
The hardware encryption will not help much if you do not trust to uploader, because you need to provide your TPM to uploader to unlock your data on hardware.
I can imagine only one way - you need to encrypt your data with some tool, PGP/GPG or just AES256 for example and then give a temporal limited Access grant (only for write operation) to uploader, it will be able to upload data to your buckets (to only allowed by that access grant), then later this access grant will expire. You also can revoke it.
However, you then need to decrypt the uploaded data with your encryption/decryption tool on the fly every time when you access it.

Pentium100 · December 23, 2020, 10:05am

Maybe one way to do it would be to have a “print to file” equivalent for the uplink. Uplink generates encrypted segments and saves them to a hard drive.

Then the uploader splits them up for redundancy and uploads them without ever knowing what was in that data.

The end result would be that the owner of that data would be able to decrypt it with upink normally without using another encryption tool.

Alexey · December 23, 2020, 10:07am

Would not work. They need to have an access to your buckets for write, so you will be forced to give them your Access grant for these buckets with at least a Write access anyway.
However, I like your idea.

tylkomat · December 23, 2020, 11:10am

Interesting might be an option to pre-encrypt data with uplink, but don’t actually upload it, instead just create an encrypted copy on the filesystem and generate an access grant to for the uploader which also has the information encoded that this data is already encrypted. So when the uploader uploads the data it is not encrypted twice, but used as provided. The rest of the flow would be the same as if the customer would have been uploading the data himself.

jammerdan · December 23, 2020, 12:46pm

Part of the idea is that the uploader is trusted by Storjlabs and has certain reputation as data center provider.

Hardware encryption is mostly considered for issues on transportation (lost, stolen, wrong delivery, etc. ) side and maybe for secure deletion (destroy key instead of wiping hundreds of terabytes). Of course customer and Upload provider must be able to access the data in a way.

Something like this sounds interesting. But on customer side the need of resources should be as low as possible. Imagine to encrypt and split 250 TB of data and in the worst case customer would need 2.7x the HDD space for his data. It would be perfect if creation of redundancy and final encryption could be done on the upload providers side who possibly has hundreds of instances of computers that could run such jobs.

Alexey · December 23, 2020, 2:54pm

This is mean that you willing to expose your encryption phrase (or at least an encryption key derived from it in case of access grant) to the agent.

jammerdan · December 23, 2020, 3:07pm

If it is a key derived from the ‘master’ encryption key, the agent has only write access meaning he cannot see the other contents, maybe even restricted to a bucket specifically created for the uploads and the key can be voided afterwards, why not?

Alexey · December 23, 2020, 3:30pm

Then you can do it now, you only need an agreement between you and the uploader without involving third parties (StorjLabs in this case).

jammerdan · December 23, 2020, 3:48pm

You mean like this: what you wrote here:

Alexey · December 23, 2020, 4:15pm

No, with a small update:

Conclude an agreement with uploader (service provider)
Use the hardware encryption to transport your vault securely to the uploader’s datacenter.
Give the TPM and restricted access grant (with only write permissions and not valid after descent amount of time) to your uploader
Uploader unlock the vault with TPM and transfer data using given access grant
Uploader report that all done and wipe the vault and destroy/demagnetize TPM
You revoke the access grant or delete the API key from the satellite and create a new one.

jammerdan · December 23, 2020, 5:20pm

I think we misunderstood each other.
I thought you were talking about the encryption key for the uploader to upload data onto Tardigrade.
But it seems you are talking about the encryption keys that secure the data from preying eyes.
For this my answer is, maybe there are customer that don’t care, but honestly I think the a solution should be a zero knowledge solution which does not require agents access to the unencrypted data.
As said, I have no idea what would be technically required to achieve this, but I am sure, it would require Storjlabs involvement to find a solution for this.

There is also the aspect of customer service that I would like to bring to your attention. I mean, seriously (no offense), if a customer wants to transfer hundreds of terabyte onto Tardigrade and requires an on premise solution for this, all Storjlabs would be saying is: “Go find some datacenter where you can send you hard disks to, make some contract and all is good?” Really?
I mean compare the service a potential customer would receive from Google or Amazon who both send hardware appliances to customer so he can transfer his data reliably and securely. Amazon has even mobile data centers, that customers can order from them.

So I think your suggested solution is possible, but not what it should be, especially for such high volume customers. Storjlabs involvement in such process is absolutely mandatory in such process.

Alexey · December 23, 2020, 5:31pm

For honestly I do not see how it could be solved without using a server-side encryption or using a temporary access grant (both requires trust to the service provider, because initially data is unencrypted) or pre-encryption of data, which should be transferred (requires additional overhead from the customers’ side).

And what the role of the StorjLabs here?
Be the guarantee or arbiter between you and the service provider?

jammerdan · December 24, 2020, 4:26am

I am just thinking right now if the repair function could be of any use and if the upload agent could act like a satellite?
Basic idea: Customer would create pieces locally but only minimum numbers that are required to repair, to keep amount of data as small as possible. Moves this to upload agent. Upload agent is running something like a local ‘repair satellite’ which sees the minimum files on the disks locally and repairs them into the Tardigrade network.
Similar to a regular satellite when a file requires repair.

Storjlabs should make any required changes to the software, they should provide required software (if required for customer and upload agent). Storjlabs should define the whole process and standards and certify and audit the data center to meet all requirements. Storjlabs could even go as far as producing or purchasing required hardware, like special rugged disks similar to the ones that we know are used to transport movies to cinemas or what AWS or Google provides.
In my idea Storjlabs is the contact point for everything. In my idea it goes like this:

Customer signs up with Tardigrade.
Customer contacts Storjlabs that he wants to ramp up a few hundred TB.
Storjlabs would notify nearest upload center.
Upload center prepares the disks and send them to customer.
Customer receives all required hardware, instructions and direct contact to upload center for questions.
Storjlabs keeps monitoring
Customer sends disks to upload center
Storjlabs keeps monitoring
Once all data is on Tardigrade Storjlabs bills customer and pays data center.

Surely other ideas are possible, but this is just my idea how I would do that.

Alexey · December 24, 2020, 10:13am

The idea is interesting, the repair worker is not a satellite, only part of it and could be deployed separately.
The only problem that the repair worker do not have a deal with files, it have a deal with pieces - it should download 29 healthy pieces, reconstruct missed 51 pieces and upload these 51 pieces back to the network (and not to the same nodes).

This problem could be solved differently - you can specify the minimum required pieces on the client side (even if it would not meet the requirements on the satellite) and if there would be a mechanism to store pieces locally instead of distribution - it could be the possible solution which doesn’t requires trust. The agent just upload prepared pieces and then a regular worker will repair missed pieces.
Moreover, I think such a deal could be done as a smart contract, i.e. if all pieces are delivered to the network, the agent can release funds as a payment for their service, in such a case this would remove needs to monitor anything.
So, the customer:

Should prepare pieces for upload (for the 29 pieces from 80 the overhead would be 100.41% instead of 270%, but it’s too risky, better to prepare 35 pieces) and upload them to the vault.
Will pay to the service provider for deliver and upload.
Will pay to the network for repairing missed pieces (this payment could be transferred to the service provider, but I think the service provider would charge the customer more than that in such a case).

The pro:

The customer will not need to have an additional tool for pre-encryption and post-decryption for their data.
The time of migration could be shorter than a direct upload via customer’s channel.
The proposed solution does not require trust to the service provider.
Using a smart contract does not require the complicate physical monitoring of the deal between the customer and the service provider.

The cons:

The customer will use their compute resources to encrypt and slice data
The customer will pay not only for delivery and upload but for repair too (they could avoid the repair costs, if they produce a complete number of pieces, i.e. 270% overhead, which requires more space in the vault and definitely will costs more, maybe the repair costs would be lower than costs of additional space in the vault).
Need to implement a smart contract and the Oracle for the monitoring of the deal.