Inter-Satellite Metadata Replication

RedBlue · January 11, 2024, 4:17am

The STORJ whitepaper discusses complications and bottlenecks, arrising from various Byzantine consensus algorithms and, at least for now, settles on a centralised metadata/maintenance mechanism. Why not allow the client uplink/rclone replicate the metadata between multiple satellites? That is, the data will still be stored on the same nodes, but multiple satellites will store the metadata, do the audits and repairs. The number of candidate nodes would be restricted to the intersection of the nodes that trust each involved satellite and the satellites would need to trust each other to coordinate node selection. Alternatively, the client uplink/rclone could choose which satellite’s node pick to use and just inform the other satellites.

The benefit for the satellite owners would be greater demand, the benefit for the node operators would be greater bandwidth utilization for audits and less reliance on a particular satellite, the benefit for the end user would be a more efficient and cheaper way to remove reliance on a single satellite (assuming the fees for metadata storage are cheaper due to the underlying data not being replicated). If the only way for a user to remove reliance on a single party is to manually replicate the data, they might as well diversify between the protocols/backends as well, such as Storj, Sia, AWS/GCP/Azure, etc. If, however, Storj allows easy and cheaper ways, people might choose to stick with Storj.

At this point, perhaps, all major satellites are operated by the same company, so this wouldn’t be as effective, but it would open the door to the STORJ ecosystem flourishing in the future!

RedBlue · January 11, 2024, 5:50am

Byzantine consensus is required for when the distributed nodes need to coordinate changes with each other. If, however, all change originates from the same place, the client, a two-phase commit or even eventual replication works just fine. Audit checks can still be done separately by the satellites and then repair and data relocation can be run by the client as a cron job.

Alexey · January 11, 2024, 8:13am

Hello @RedBlue,
Welcome to the forum!

It unlikely will be automatic. The access grant contains the satellite address, its API key, and your encryption key and caveats (see Understand and Manage Access Grants - Storj Docs), so you need explicitly choose the access grant for upload or download before start.

Replicating a Metadata sounds like increasing costs, so you definitely may upload your data using different satellites, and you will have up to 3 (4 if you also choose a Storj Select) copies of your data, then you may choose the access grant to download your data using the needed satellite.
However, I did not get why do you need that? Data is transferred directly between your uplink and nodes and you will upload or download to/from the fastest nodes to your location in any case, the satellite here almost doesn’t matter.

thepaul · January 11, 2024, 2:33pm

I would want that as a customer, in case of catastrophic satellite failure. Or in case Storj Labs were to go out of business, if there were viable third-party satellites I’d want to duplicate metadata to those.

There are some hurdles that we’d have to get past to implement this, though. We’d need some way of working with multiple access grants, and a new storage format for pieces that can be owned by more than one satellite. Then it could be tricky or expensive to identify a set of destination nodes that all trust all of your destination satellites. Things like that.

But in the meantime, if you don’t mind paying for the data twice, you can store it twice!

RedBlue · January 11, 2024, 2:45pm

You’d specify multiple satellite addresses and their API keys. The encryption password would have to stay the same, so you’d have to separate access grants into subcomponents for this use case.

You can already do that, but a 3x satellite replication would 3x the cost! If you only replicate the metadata, it would be significantly cheaper, since you don’t have to pay for storage 3 times.

In the spirit of decentralisation! Storj does decentralise the data, but all the metadata is controlled by the same company. If the metadata is damaged, the data might as well be gone too, since you can’t possibly find it. That’s fine and acceptable, but why not make it better if we can?

jtolio · January 11, 2024, 5:14pm

Hi!

As you point out, the Satellite as it currently is designed is responsible for metadata storage and management. It is also responsible for data repair, liveness, and storage node payment. So here are the main questions we’ve been asking ourselves about such an idea:

If two Satellites, run by different organizations, have copies of the metadata for collection of 1TB stored on storage nodes, which Satellite is responsible for repair? Which Satellite chooses what new nodes the data should be stored on and moves the data to them?
How do the Satellites agree about updates to data? Customers want atomicity on upload/update. How are synchronization issues across the Satellites solved? Is one primary and one secondary? How do they agree on failover?
Which Satellite pays the storage nodes?

These questions aren’t impossible to address of course, but any proposal around having multiple Satellites involved needs to resolve these questions. We haven’t come up with a cohesive strategy on these questions ourselves (yet)

RedBlue · January 11, 2024, 6:02pm

Btw, thank you!

I also think it’s a matter of picking the right solution out of multiple ways to implement this. This should be possible. Some questions have a single correct answer and others are a matter of preference.

Q: Which satellite is responsible for repair? Which satellite chooses the new nodes? How do they agree on the nodes?
A: There’s a couple possible ways to do this.
1. All the satellites run audits and accumulate the stats, but don’t relocate and repair the data. The client regularly checks the audit stats from all the involved satellites and relocates the data itself. This effectively shifts the responsibility of picking the nodes to the client. That would require a client that regularly checks up on the data.
2. Satellites could vote between each other, assigning scores to each candidate storage node and deterministically running an optimisation algorithm to pick the nodes (e.g. sum up the scores and sort). If one or more of the satellites is Byzantine, the storage node could notify the others that it didn’t get approved by that/those satellites. Other storage nodes could be tried and if the disagreement persists, then the satellites could inform the client that there’s disagreement between satellites. Then, I guess, the client can pick different satellites.

Q: How to achieve atomicity?
A: Uploading to a storage node is atomic. The client should get responses from all the satellites and then upload. If a satellite doesn’t respond/rejects the request, retry or abort and delete the data from others too. No Leader satellite is required, the client is the leader.

Q: Which satellite pays the nodes?
A: They all pay proportionately. The storage costs are shared equally and the bandwidth costs are paid for by the satellite that approves them. In practice, I guess, bandwidth would also be load-balanced.

These are just some ideas to show that the issue is solveable. It would require a lot of changes, but I think it’s possible to get there incrementally.

RedBlue · January 11, 2024, 8:59pm

By the way, even now, I think, some people would use multiple Storj Labs satellites for their data if this were to be implemented. It’s very unlikely that the company would act maliciously with the data, but catastrophic failure in a zone is, though still unlikely, possible.

Alexey · January 13, 2024, 4:38am

Why do you think, that triple of your Metadata will cost nothing?
If seriously - it will increase costs anyway, storing a Metadata is not free - some segments (which size is less than their Metadata) are stored directly on the satellite (it’s also a storage node but with this limit), so some data will be stored triply. If you used a small segment size, you may end with storing most of your data on the satellites.

3x costs of doing so…

This requires to run a satellite (the full node) on the customer’s hardware and it must be online 24/7 with redundancy, otherwise your data may be lost (welcome to SIA and Filecoin) or corrupted (welcome to Filecoin). You also cannot run a full node on the smartphone (yet?) to have your data in safe.

requires trust between them, it could be possible for the same owner, but not always for the satellites offered by different companies.
So likely this should be implemented on the client side and may affect TTFB significantly instead of improvement.
The only way to solve this is to implement a some consensus system, but currently there is no fast consensus system in the wild, thus - more nodes in the consensus system - the slower it will be (you need to have a confirmation from the majority + latency). So it will only slow down the client.

time for selection and retry → greater TTFB.

requires trust or a blazingly fast consensus algorithm, see above. It’s also not attractive for the satellite operator to share income, or the price for the customer should be tripled (because each satellite operator will get only 1/3 of the income, but will spend a full price for audit and repair, also hosting and traffic costs will remain the same). How is it different from storing 3x data?

They may do so even now - upload your data using 3 different satellites. You may configure rclone to use multiple destinations (but please read caveats!):

In short - I still do not see any benefit from it, except have a more decentralization in expense of the higher costs for the client.

The only valid case for me - if the customer will host an own satellite and use some public one and will replicate data between them, however, they will be forced to pay for the infrastructure and traffic to host an own satellite, also they should pay to Storage Node Operators, who would add their satellite to the trusted list (or they need also host nodes themselves = costs for infra + traffic) and to the public satellite operator as well.

RedBlue · January 13, 2024, 2:34pm

So it may be. I would assume that’s a very small subset of data and the effect of this wouldn’t be noticeable.

I agree, it’s suboptimal. For some clients, that wouldn’t matter much, but others would prefer a more complete solution.

It requires that they are able to use each other’s API and have the same algorithm for node selection. In traditional cases, where things like Paxos or pBFT might be used, there’s no central party that sees the whole picture and all the consensus steps are there to avoid a situation where a “consensus” is reached, but in fact nodes chose different answers. In Storj’s case, there is a central node that must see the whole picture if it’s done correctly - the storage node. The storage node can inform the satellites that the consensus wasn’t reached. Of course, the storage node itself might be faulty or malicious, so multiple storage nodes would have to be attempted before deciding that no consensus could be reached.
Similarly, for payouts, it’s necessary that all involved storage nodes trust all involved satellites for payments.

Agreed. If a user is prepared to pay more for the service with multiple satellites, they might be ok with a somewhat reduced throughput too. Though, I’m not sure ping matters that much with such large chunks and parallelism.

It is! The satellite operator’s profits come from the difference between what the user pays them and what they pay to storage nodes. The storage nodes still get the same amount for the same data, but a 3x redundancy between satellites would 3x their total margin.

Alexey · January 14, 2024, 8:35am

= Requires trust
Or - a consensus algorithm.

It cannot be trusted, thus all confirmations of the usage (orders) must be signed by the uplink (client) and by the node.
So, the node solely cannot be used as a central point of trust unfortunately. Only the satellite or the uplink on the customer’s side are able to be such a point.

This is a rough description of the consensus algorithm, which is slow, because it requires confirmation for every command from the whole network and you would set a level how much confirmations are enough to trust this command. And this will significantly slow down the upload or download - this is a main reason, why the satellite is implemented - you do not need a consensus, with API key your command will be executed immediately. We specifically avoids the usage of consensus algorithm (until there would be a fast one) to select nodes for uploads or downloads, see

It’s not about the ping, it’s about the time to the first byte. For video that’s mean, that you will wait several seconds (or even minutes!) before the video would start to play.

You did not get it. The customer should pay to all three satellite operators, since their (meta)data are attached to these three satellites. So, if they would pay a standard price, each operator will receive only 1/3 of what they receives usually. This is not what every satellite operator will accept.

RedBlue · January 14, 2024, 11:52am

My guess would be that each coordination would take milliseconds (O(~100 ms)?), not seconds. Only storage node selection failures would significantly slow it down, which should be very rare. Other than that, it would be like selecting the slowest of N responses. See probability - Expected value of maximum and minimum of $n$ normal random variables - Mathematics Stack Exchange for a rough idea of the effect.

Ah, I think I see the misunderstanding. Of course, the customer would pay more in total, but the cost of adding 1 satellite would be significantly smaller than duplicating the data 1 more time, since you’re only paying for the satellite services, not extra storage nodes.

I’m just throwing the idea out there. Of course, it might not be a current priority for you or you might not want to have this available for other reasons, but I think this would be nice to have for everyone.

Alexey · January 14, 2024, 2:39pm

It is very welcomed, but right now the economic and math doesn’t allow to implement this.
We found a balance, it’s not ideal, but at least it doesn’t affect customers

But eventually we will find a way, how to increase the decentralization without affecting of the customers’ experience.