Part of the idea is that the uploader is trusted by Storjlabs and has certain reputation as data center provider.
Hardware encryption is mostly considered for issues on transportation (lost, stolen, wrong delivery, etc. ) side and maybe for secure deletion (destroy key instead of wiping hundreds of terabytes). Of course customer and Upload provider must be able to access the data in a way.
Something like this sounds interesting. But on customer side the need of resources should be as low as possible. Imagine to encrypt and split 250 TB of data and in the worst case customer would need 2.7x the HDD space for his data. It would be perfect if creation of redundancy and final encryption could be done on the upload providers side who possibly has hundreds of instances of computers that could run such jobs.
If it is a key derived from the ‘master’ encryption key, the agent has only write access meaning he cannot see the other contents, maybe even restricted to a bucket specifically created for the uploads and the key can be voided afterwards, why not?
I think we misunderstood each other.
I thought you were talking about the encryption key for the uploader to upload data onto Tardigrade.
But it seems you are talking about the encryption keys that secure the data from preying eyes.
For this my answer is, maybe there are customer that don’t care, but honestly I think the a solution should be a zero knowledge solution which does not require agents access to the unencrypted data.
As said, I have no idea what would be technically required to achieve this, but I am sure, it would require Storjlabs involvement to find a solution for this.
There is also the aspect of customer service that I would like to bring to your attention. I mean, seriously (no offense), if a customer wants to transfer hundreds of terabyte onto Tardigrade and requires an on premise solution for this, all Storjlabs would be saying is: “Go find some datacenter where you can send you hard disks to, make some contract and all is good?” Really?
I mean compare the service a potential customer would receive from Google or Amazon who both send hardware appliances to customer so he can transfer his data reliably and securely. Amazon has even mobile data centers, that customers can order from them.
So I think your suggested solution is possible, but not what it should be, especially for such high volume customers. Storjlabs involvement in such process is absolutely mandatory in such process.
For honestly I do not see how it could be solved without using a server-side encryption or using a temporary access grant (both requires trust to the service provider, because initially data is unencrypted) or pre-encryption of data, which should be transferred (requires additional overhead from the customers’ side).
And what the role of the StorjLabs here?
Be the guarantee or arbiter between you and the service provider?
I am just thinking right now if the repair function could be of any use and if the upload agent could act like a satellite?
Basic idea: Customer would create pieces locally but only minimum numbers that are required to repair, to keep amount of data as small as possible. Moves this to upload agent. Upload agent is running something like a local ‘repair satellite’ which sees the minimum files on the disks locally and repairs them into the Tardigrade network.
Similar to a regular satellite when a file requires repair.
Storjlabs should make any required changes to the software, they should provide required software (if required for customer and upload agent). Storjlabs should define the whole process and standards and certify and audit the data center to meet all requirements. Storjlabs could even go as far as producing or purchasing required hardware, like special rugged disks similar to the ones that we know are used to transport movies to cinemas or what AWS or Google provides.
In my idea Storjlabs is the contact point for everything. In my idea it goes like this:
Customer signs up with Tardigrade.
Customer contacts Storjlabs that he wants to ramp up a few hundred TB.
Storjlabs would notify nearest upload center.
Upload center prepares the disks and send them to customer.
Customer receives all required hardware, instructions and direct contact to upload center for questions.
Storjlabs keeps monitoring
Customer sends disks to upload center
Storjlabs keeps monitoring
Once all data is on Tardigrade Storjlabs bills customer and pays data center.
Surely other ideas are possible, but this is just my idea how I would do that.
The idea is interesting, the repair worker is not a satellite, only part of it and could be deployed separately.
The only problem that the repair worker do not have a deal with files, it have a deal with pieces - it should download 29 healthy pieces, reconstruct missed 51 pieces and upload these 51 pieces back to the network (and not to the same nodes).
This problem could be solved differently - you can specify the minimum required pieces on the client side (even if it would not meet the requirements on the satellite) and if there would be a mechanism to store pieces locally instead of distribution - it could be the possible solution which doesn’t requires trust. The agent just upload prepared pieces and then a regular worker will repair missed pieces.
Moreover, I think such a deal could be done as a smart contract, i.e. if all pieces are delivered to the network, the agent can release funds as a payment for their service, in such a case this would remove needs to monitor anything.
So, the customer:
Should prepare pieces for upload (for the 29 pieces from 80 the overhead would be 100.41% instead of 270%, but it’s too risky, better to prepare 35 pieces) and upload them to the vault.
Will pay to the service provider for deliver and upload.
Will pay to the network for repairing missed pieces (this payment could be transferred to the service provider, but I think the service provider would charge the customer more than that in such a case).
The pro:
The customer will not need to have an additional tool for pre-encryption and post-decryption for their data.
The time of migration could be shorter than a direct upload via customer’s channel.
The proposed solution does not require trust to the service provider.
Using a smart contract does not require the complicate physical monitoring of the deal between the customer and the service provider.
The cons:
The customer will use their compute resources to encrypt and slice data
The customer will pay not only for delivery and upload but for repair too (they could avoid the repair costs, if they produce a complete number of pieces, i.e. 270% overhead, which requires more space in the vault and definitely will costs more, maybe the repair costs would be lower than costs of additional space in the vault).
Need to implement a smart contract and the Oracle for the monitoring of the deal.
What are these repair costs? Is there a way to quantify them? I understand normal repair costs as such that pieces have to be downloaded from SNOs which get paid for that.
But that is not the case here. The upload agent receives has all pieces locally. Uploading to Tardigrade should not come with costs.
Only reconstructions which is probably CPU resources plus disk space coming to my mind.
Am I overlooking something?
Unfortunately the upload to the network is not free for most of cloud computing.
If you mean that service provider should act as a repair worker then it’s not desirable because the worker requires a direct access to the satellite’s database and thus must be trusted to the Tardigrade satellite (and have keys).
The other way is to act as an uplink on behalf of the customer, in such a case the service provider must have a customer’s keys (or access grant).
Both ways are not good and requires trust. And will be much complicated with all documents related to data security. It’s better to avoid any trust requirements between parties.
The simplest and workable solution would be to upload 35 pieces to the network and let workers doing their job for repair costs. Or upload all required 80 and do not pay this fee.
The only one problem is remained - the path to the bucket and the account (API key). The pieces itself do not contain such an info.
So we can’t avoid of giving an access grant to the agent to upload those pieces to the bucket. But in case of pre-encrypted pieces we can give only a temporary write access without exposing even derived encryption key.
Here is a nice AWS case study for such a service (which seems to be required to get big data into the cloud):
As the DAM system was being implemented, the Rock Hall was undertaking a project to modernize its aging LTO storage and offsite backup for the preservation of large digital files, by moving the files to the cloud. Many of the LTO tapes were not easily accessible due to hardware and software failures and onsite storage limitations. Parnin says, “As the tech landscape progressed from the non-digital to digital, our archival storage system had become unmanageable and unsustainable.”
Using Amazon S3 and S3 Glacier Deep Archive provided the Rock Hall with the confidence that its digital media would be preserved and easily accessible at an affordable price. However, the Rock Hall still needed to recover the data on the LTO tapes. Working with AWS, the project team ingested the files into S3 Glacier Deep Archive via six AWS Snowball Edge Storage Optimized devices. Using AWS Snowball Edge helped to address common challenges with large-scale data transfers, including high network costs, long transfer times, and security concerns.
The Rock Hall worked with Tape Ark and its strategic partner Seagate Technology (Seagate Powered by Tape Ark) to extract all the data from the LTO tapes and load it onto the AWS Snowball Edge devices. Tape Ark then sent the Snowball devices loaded with data back to AWS, and the data went right into the Rock Hall’s Amazon S3 bucket. Once the Rock Hall’s digital media was in Amazon S3 on the AWS Cloud, Amazon S3 lifecycle policies were set up to automatically move the media files into S3 Glacier Deep Archive to help optimize storage costs for files that were rarely accessed. For the Rock Hall, the process was effortless, with Tape Ark managing the end-to-end migration.
@Alexey: I don’t know if the new Gateway MT could help with that. But if you read the above case study I believe a solution is really needed to get big data on board. I think nobody with a sane mind would even try to upload 300 TB x 2,7 = 810 TB over normal internet connection. Even with a SDSL fibre connection with 1 Gbit upload speed, which is among the fastest what you can get here it would take 75 days.
From my view there needs to be a way to prepare everything locally on disks, send them to a Storj Labs approved data center and they upload it within a couple of days or even hours.
Yes, the Gateway MT may help here but only for service provider - since their bandwidth would be utilized only 1x instead of 2.7x
All other will remain the same. The only problem there is an unencrypted data.
In case of Gateway MT it’s also can be only server-encrypted at the moment, the client-side encryption for Gateway MT is not implemented yet.