I received an email just now telling me that my node has been suspended by one of the US satellites because it ‘produced too many errors during audits’. However, when I look on my node dashboard I see all my audits and uptime at 100%.
Is there something wrong either with the satellite to cause this, or is there a condition in which there could be unreported audit failures in my dashboard?
All the storage for my node is on my fileserver, which uses a redundant ZFS filesystem (so any read errors should be transparently found and repaired). I had a drive fail last week, but that was automatically replaced by a hot spare with no data loss/corruption, so that shouldn’t be the cause.
Are there any troubleshooting steps I can do from my end to figure out where the issue actually lies here?
Edit: it seems Asia East has also now suspended me, but still nothing in my dashboard…
Update: I’ve finally seen something in my dashboard: on two satellites my audit percentage has dropped off to 99.98% (which itself seems odd to me). But this is a far cry from the ‘audits have dropped to the point that I’m being suspended’…
The drop in audit score will not have a suspension, when the audit score drop below 60%, the node will be disqualified.
So, if online score is 100%, then the suspension score dropped below 60%, but if the problem which prevented to pass audit/GET_REPAIR is solved, and the node started to pass audits and provides requested pieces without an error - the suspension score will quickly recover (with each successful audit/repair transfer).
When you check for errors you need to check for GET_AUDIT and GET_REPAIR, since they both affects suspension score (in case of unknown errors during audit or repair transfer) or audit score (in case of known errors like “file not found” or ongoing timeouts).