Got Disqualified from saltlake

BrightSilence · August 20, 2021, 8:18pm

That may have been the initial goal, but the suggestions main focus is to give the score a longer memory. Even without tuning, the current system disqualifies a node after 10 consecutive failures while my suggestion would require at least 40 consecutive failures. That already gives you 4x as much time to fix things. If that memory is further increased for nodes with a lot of data, you could easily tweak it to give those nodes 10x as much time. I would agree that it’s not a full solution and it would be better for the node to protect itself. But you can’t argue that that wouldn’t help a lot already.

Making this part dynamic makes it so that you can aim for a specific time frame to give nodes to repair issues. That should take care of the fact that it would otherwise take 2 weeks for smaller nodes and a few hours for larger nodes. The timeframe would be closer together for all nodes and you could aim at something like maybe 2 days for everyone.

I am not against putting nodes in a warning state earlier, before disqualifying them… in fact, I suggested exactly that at the bottom of that topic. However, it would require making the score more stable first, otherwise nodes would just constantly pingpong in and out of that warning state.

Tuning audit scoring

Bonus suggestion
Now that there is a more consistent representation of node data quality, we can actually do something new. We could mark the pieces of a node with an audit score below the warning threshold (for example 95%) as unhealthy. And at the same time lower the incoming traffic for that node (they could be part of the node selection process for unvetted nodes). This will result in data for that node slowly being repaired to other nodes, potentially reducing the troublesome data and fixing it, while at the same time lower the risk of new data stored on that node getting lost. This will rebalance good and bad data on the node and will allow nodes that have resolved the underlying issue to slowly recover, while at the same time nodes that still have issues will keep dropping in score anyway and fail soon enough. This also provides an additional incentive to keep your node at higher audit scores to keep all ingress and prevent losing data to repair.

I wouldn’t advocate for that. Having money at stake is one of the reason I never got started with Sia. Maybe if it’s losing income, like temporarily holding back held amount again or something. That might work. But I don’t want to hide away an issue with just monetary consequences instead of disqualifications. In my opinion it should be possible to prevent honest nodes from having those bad consequences altogether. The way I see it, right now honest nodes are at a disadvantage, because someone abusing the system is definitely just going to take their node offline to postpone being disqualified, yet honest nodes currently don’t do that. So lets try to make the node protect itself first.