Distribute audits across storagenodes

BrightSilence · August 13, 2019, 2:18pm

I’ve been wondering whether some of the tasks a satellite performs can be distributed across nodes. Mainly auditing and repairing. I took some time to outline a possible solution for distributed auditing first since that seemed less complicated. Turns out it was longer than I expected, so I’ll leave the challenge to tackle repair for some other day. I’m sure there are holes in this approach that I overlooked, so don’t hold back, I can take it.

Idea also posted here: https://ideas.storj.io/ideas/V3-I-40

Currently audits are done by the satellite by retrieving a stripe from all storagenodes that store an erasure share for that stripe. These erasure shares are then audited by using a Berlekamp-Welch algorithm to detect which (if any) may be corrupt.

Moving this process to storagenodes introduces a view challenges

Storagenodes should be considered untrusted environments, so any work they do needs to be verified.
With the planned removal of Kademlia, storagenodes have no ability to find other nodes themselves.
Storagenodes need to be able to determine whether audit requests they get from other storagenodes are legitimate.

Selection of nodes and stripes to be audited
A recent design document already outlined a new method of selecting nodes and stripes to be audited. This task should remain in control of the satellite. Next the satellite should exclude nodes that are currently offline (listed in the failed_uptime_checks table with back_online IS NULL).

Distributing audit tasks
Because we can’t assume the node performing the audit can be trusted, we need to verify the results of the audits performed. If we let one node run the Berlekamp-Welch algorithm to determine which erasure shares are corrupt, there is no way for the satellite to verify that result. A node could simply claim all erasure shares are correct and move on.
To prevent that possibility, the remaining nodes should be split up in batches of size n, with n being the minimum number of erasure shares required to recreate the stripe (currently 29). The satellite will select a node for each batch to perform the audits and and send it an audit task containing signed download requests that contain the node IDs, node addresses and erasure shares to be audited.

Auditing
The node performing the audit sends the signed audit request to each of the n nodes for the specific erasure share. The nodes receiving the audit request check the signature to make sure it is a legitimate audit request initiated by a trusted satellite and return the requested erasure share. The auditing node will then us erasure coding to try to recreate the stripe.

If successful, it will create a hash of the stripe and send that hash back to the satellite to verify it against a stored has for that stripe.
If not successful, it will return an audit failure message to the satellite.

Fallback
If the satellite for this stripe receives an audit failure from one of the auditing nodes or detects a mismatch between the returned stripe hash and the stored stripe hash, it will need to fall back to doing an audit on the entire stripe like before to detect which nodes/erasure shares are corrupt. This audit could optionally exclude nodes in batches that did return the correct stripe hash. Auditing on the satellite would only need to take place if one of the distributed audits fails.

Upsides

By moving most of the work for audits to nodes, the resources available to perform audits scale with the number of nodes
The satellite is only performing audits when a problem has been found

Downsides

Still requires the satellite to perform audits in case of failure on the distributed audit
Chance of failure on the distributed audit is relatively high since all erasure shares are required to recreate the stripe, therefor it is essential that nodes are actually online. This is only partially mitigated by checking the failed_uptime_checks table and might require uptime checks to happen right before the audit is distributed

Remaining considerations:

Should nodes be compensated for doing this work and if so how?
Should auditor participation by storagenodes be a setting and if so, what should the default be?
How to make sure a node does not get too many audit tasks at the same time?
What time outs should be considered for 1) waiting for erasure shares to be downloaded from nodes being audited and 2) the entire process of auditing and returning the stripe hash to the satellite

kaloyan · August 20, 2019, 12:37pm

@BrightSilence Thank you for taking the time to propose this idea in such a structured way!

I agree it makes sense to distribute more tasks to storage nodes and offload the satellite. This is our long term plan too.

Your proposal assumes that the satellite keeps hashes of the stripe being audited, but this is not actually true. Currently, the satellite selects a segment to audit, and then a random stripe from this segment. Then it requests all nodes storing a piece from this segment to return the respective erasure shares. Then the satellite uses Berlekamp-Welch algorithm to detect if any of the returned erasure shares is an outlier in the reconstructed polynomial (stripe).

The beauty of the Berlekamp-Welch algorithm is such that it does not require any prior knowledge about the stripe (like a hash). The erasure shares from the storage nodes are all that is required to detect if any of them is wrong data.

Hence, the satellite does not store any stripe hashes. Also, it is not practical for doing this, because the audit is designed in a way that any stripe can be audited at any time. Keeping hashes for all possible stripes is comparable to storing the entire data on the satellite.

BrightSilence · August 20, 2019, 1:13pm

I realized this may have been an issue later on. However, it could still work except instead of comparing the hash to a stored hash you compare the hashes you get back from multiple auditing nodes checking on different batches of the same stripe. That way no stripe hash storage on the satellite is required, but you can still verify both auditing nodes recreated the same stripe based on different sets of erasure shares.

This approach does require at least 2n pieces to be available (n being the minimum number of erasure shares required to rebuild the piece). Other pieces could only be audited by the satellite as is.

BrightSilence · August 20, 2019, 1:20pm

hrmm, actually, it wouldn’t matter if there is overlap. Even with only n pieces you can have 2 nodes audit the same n erasure shares and compare results.

Say n = 29. There are 35 pieces available. Give 2 nodes each 29 erasure shares to check such that all are covered and 23 overlap. Compare results. Job done.

kaloyan · August 20, 2019, 1:25pm

OK. This makes sense. The satellite may request 2-3 storage nodes to perform the same audit. If the result from all of them is successful and they return the same stripe hash then the satellite can accept this audit as successful. Otherwise, the satellite should fallback to normal centralized audit.

The main question for me is if this is economically viable. The satellite has to send all contact info to the auditing storage node. This is a message with the Node ID and address for all 80 (or less) nodes holding a piece from the segment. Sent to 2 or 3 nodes. This is egress data for the satellite, which is paid to the cloud provider where the satellite is running. Compare this to the current approach where the satellite downloads the erasure shares from the storage nodes, which is ingress data available for free from most cloud providers.

We have to measure the and ingress and egress data in both cases and make the calculations.

BrightSilence · August 20, 2019, 1:28pm

Yep, that was the danger with trying to tackle audits first. There is likely much more to win with repairs.
For audits it might indeed not be economically viable. Either way, I enjoyed the thought exercise. If it helps in any way, even better. But if not, I’m ok with that too.

kaloyan · August 21, 2019, 10:21am

These were all good thoughts! Thanks for sharing them! We are eager to hear more of your ideas.

BrightSilence · November 4, 2022, 7:52pm

@danw @john I noticed you mentioned audit scaling during the twitter spaces. I was just curious to see if a distributed delegated audit system like this was being considered? The upside of course being that if nodes do audits, it will scale automatically with the size of the network instead of having to scale services at the satellite end.

Vadim · November 4, 2022, 8:43pm

I think there is possible to spin some special nodes with contracts made only for auditing. And report back to sattelite results. They will have ask data once a day in compresed way, and do it work. 50-100 of this nodes will do work easily.

Mitsos · May 12, 2024, 9:18am

zk = zero knowledge. I (as a node) don’t need to know anything about the underlying asset. All I need (as a node) is a list of segments to audit, and a list of nodes storing those segments. We could even go a step further and reverse that. As a node being audited, I can get a list of nodes to send my hashes to, or even add the hashes to a block. That way the auditing node doesn’t need any information on who is actually storing the segments. When 100 auditing nodes get the hashes from the blockchain, they sign the current epoch (verifying that all segments do indeed have the same hashes) and the satellite steps in if there are any discrepancies (ie if an audited node added a wrong hash, go to that node directly and ask for a new hash/verify other segments).

In this case instead of the satellite doing all the work, the few VIP nodes do the majority of the work and the satellite is there to verify that everything is going according to plan.

Humans didn’t have a solution to go to the moon until they did. On why “reinventing the wheel”:

Because the wheel is currently more hexagonal than round. Yes it works, but not the way it’s supposed to work.

Respectfully: I (and others) are providing constructive criticism based on the information we have at our disposal. Again respectfully, automatically switching to a “nope, can’t be done” stance isn’t productive.

Toyoo · May 12, 2024, 1:26pm

Respectfully, I’d have to see the actual math written down to prove that this would work.

Alexey · May 12, 2024, 2:01pm

Actually it can work. The math already implemented, but need some polishing, how to verify segments from the other node without knowing the expected hash (nodes do not have this information).