Node getting supended despite not missing any audit checks

Hi all,

I’m getting emails from two satellites (so far) that my node has been suspended. The email says “You won’t receive any new data on this Satellite until you resolve the issue causing audit failures on your node.” However, both nodes show 100% audit check success rate in the dashboard. The node is running version 1.5.2 atm and still has about 5% disk space (~450 GB) free.

Does anybody know what is causing this and how to fix it?

Now also on a third satellite :frowning:

The audit score on the dashboard is the score for disqualification, not for suspension. Unfortunately the suspension score is not visible for now.

The most likely culprit is a known issue with database locks that has been fixed in 1.6.3. The upside is that your node will likely fix itself even before that time because the suspension causes the load on your node to drop and it can catch up.

It’s still worth investigation though. Please have a look at your log and look for lines that contain both GET_AUDIT and failed. If you see lines about database locks on used_serials.db, it was indeed this issue and the next version will fix your problem permanently. If there is something else, please report back.

I ran docker log storagenode | grep lock and it did indeed have some database locks appearing. I guess I’ll just wait until the watchtower updates my node.

Try:

docker logs storagenode | grep GET_AUDIT | grep failed

And post some of the results here, if they’re things other than locks on used serials. Only locks on used serials will be fixed, so others will still be relevant to report.

As far as I can see it is only cases of “usedserialsdb error: database is locked” so that looks like the error you mentioned.

Good, that means your node will recover automatically and the next version will permanently fix your issue. Nothing to worry about then.