Put the node into an offline/suspended state when audits are failing

The suspension for audit failure opens door for exploits.
This suspension will be removed eventually when all kind of unknown errors would be sorted out.
For example, “I/O device error” errors are treated the same way as “file not found” - they affects audit score immediately.

Some errors (like disconnected drive) already mitigated by storagenode shutdown.
So, later or sooner there would be no “unknown” errors and suspension for failed audits.

The suspension could remain if we could made it expensive for the affected node to make abuse pointless. But in such case it would not be better than disqualification.

What I mean:
if the node failing audits because of timeouts (very easy to exploit for sure), we could put it into suspension (no egress, no ingress, only audits) and starting decreasing the held amount every hour (not longer than 24 hours, if longer the abuse would be still profitable). As soon as it become zero, the reputation is reset and node would start from 75% held back percentage, switched to vetting again, the data considered as lost with all consequences - if a repair job starts, the data on that node will be slowly removed by the garbage collector. At the end of the week, the node will be disqualified.

2 Likes