Put the node into an offline/suspended state when audits are failing

BrightSilence · August 15, 2021, 6:46pm

It’s a difficult balance. Keeping nodes that lost data on the network is dangerous. So any easing of disqualification rules has negative side effects. I do think disqualification can be a little too fast and an earlier suggestion I did would make that a little better. Tuning audit scoring

But to be honest, that’s only a workaround. Instead of having the satellite deal with this, I think the node software itself should do a better job of monitoring itself. We’ve seen a little too frequently that nodes become basically unresponsive to the point where log lines just stop being written even. But yet responsive enough to still accept the audit challenge.

In my opinion, nodes should monitor their own performance and if they are not capable of responding to a request fast enough, they should just terminate. This saves the satellite from having to find out what is going on and having to distinguish between a node running into issues or a node trying to fool the audit system.

As it stands, I can’t vote for this suggestion, because I don’t think it is the right solution. But I definitely recognize the underlying problem and feel like that needs to be solved.