Is data uploaded/downloaded through a satellite?

storaje · August 28, 2024, 4:32pm

Am I correct that client downloads can be made directly from storagenode, as opposed to having the satellite as a middleman? This allows for faster downloads by getting pieces in parallel.

Do uploads operate the same way? Can clients upload directly to storagenodes? Can this be faster than through the satellite, because the client would have to upload all the duplicate pieces as well.

Alexey · August 29, 2024, 5:01am

The satellite is never a middleman, it’s an address book, bookkeeping, audit and repair service, that’s all.
All downloads are handled by storagenodes directly. The only exception are inline segments, they are stored directly on the satellite (it acts as a storagenode in that case).

Do you mean the Gateway-MT service? Even in that case data is downloaded from storagenodes, then combined to the files and decrypted, then delivered to the customer via encrypted channel. The Gateway-MT is a distributed service too (as a satellite), but the barebone is a distributed network of nodes.

They never uploads via satellites. If they uses a native Storj integration, their uplink requests 110 independent nodes from the satellite for each segment of the file (64MiB or less) and then start uploads in parallel, when the optimal number of uploads are finished, all remained got canceled.
If they would use an S3 Storj integration, then they would upload files unencrypted via a secure SSL protocol to the Gateway-MT (if they uses S3 Compatible Gateway Hosted by Storj - Storj Docs) or to their own Gateway-ST (Setting Up a Self-Hosted S3 Compatible Gateway - Storj Docs), where files are encrypted with your provided encryption key, sliced to pieces and uploaded to the nodes via a Storj native protocol as described above.

stuberman · August 29, 2024, 1:28pm

Help me understand this. If the satellites go away, the data clients are still able to retrieve their data without them? Does each client store the location of the shards for each Storage Node?

zip · August 29, 2024, 1:55pm

No. The satellites hold all the metadata, access grants for clients to access the nodes etc., and without those you won’t be able to retrieve anything.

stuberman · August 29, 2024, 2:54pm

So the satellites sound like they are required middlemen.

Alexey · August 30, 2024, 4:30am

No, they need to have a metadata from the satellites to find nodes, which stores pieces of their files.
However, each satellite is a distributed service across many datacenters, include the database too. So it’s very hard to get it down (unless we introduce a bug). For that we have backups and the trash folder on the nodes, so in unfortunate case we can restore a database from the backup and restore pieces from the trash to do not lost anything for the time since the backup.

No, this is inconvenient for the clients, otherwise each of them would be forced to run an own full node as in SIA. It also mean, if the customer’s PC would die - the data will be lost forever and they wouldn’t be able to recover it from anywhere, if they did not have a backup.

So, always compromises between convenience, fully decentralized, fast. Choose any two.

Alexey · August 30, 2024, 4:40am

Satellites do not store access grants, they do not need to: the access grant is a macaroon, it’s self-contained. It has all what is needed to access the data: API key, the satellite URI, the derived encryption key and permissions. Anyone who have the access grant may have an access to the data.
However, if you would use an S3 integration and our Gateway-MT, then your access grant will be stored on our servers in encrypted form (it’s encrypted by the S3 Access Key), so if you registered this access grant for a public access (e.g. used a linksharing service), then anyone who have an Access Key can retrieve the data accordingly permissions integrated to the registered access grant. But if the public access is not allowed, you also need an S3 Secret Key in additional to the S3 Access Key.

Alexey · August 30, 2024, 4:46am

This is called - “coordinator”. Because the middleman means that the data is coming through it, which is not the case.
The satellite is a part of the Storj network, it contains encrypted metadata of the customer. To get an access to the metadata you need to have an API key and the satellite URI. But only metadata is not enough, it would allow you only to get a list of nodes and the order of merging the pieces back to the segments and to the encrypted file (if the permissions integrated to an API key, which is macaroon too, are allowing to do so).
You also need an encryption key to be able to decrypt the file. And anyway, you always download your data from the nodes, and uploads your data to the nodes. Either directly using libuplink, or indirectly using the S3 Gateway (ours or yours). You updates your metadata on the satellite though (it should account how much data you are storing, how much data you downloaded, your bills and invoices, etc.).

Alexey · August 30, 2024, 4:54am

Each piece is unique, there is no duplicates:

stuberman · August 30, 2024, 2:17pm

This is the opposite of ‘decentralized’. This is distributed. Storj is highly centralized where control is centralized with the Storj run satellites including pricing and customer engagement.

Vadim · August 30, 2024, 2:29pm

And there is a reason for that, for example on decentralized network in case of pricing there is no stable price and not fully understand who is responsible for something. When it under Storj control, they have contract with client, this regulates all aspects. Also for us it mean that no one go here with dump pricing and all data go to him. Also it is very hard to have price calculations when you have 80 pieses with different price. Also if there is a market, someone can go with low very low price and EB of space collect all data and turn off servers, this will cause big damage. Today this behavier is almost not posible.

Alexey · August 31, 2024, 3:32am

As I said - always compromises. If you want to have a full decentralization (not only data and metadata (!), but also no central control entity and also a market), you would have to solve multiple challenges, like a simple access to your data from a mobile device for example or when you lost your hardware and want to restore your data from the network, you also want to have a high speed for uploads and downloads and be able to control an access and securely share your data with others, etc. All of that is not fully possible for the fully decentralized network yet. Like the mobile device is not able to run a full node, you also cannot store a metadata on the network itself (because of the latency and the required consensus) making your device with the full node is a central point of a failure, etc.

It has been discussed many times and no good solution for the coordination problem: