Cancelled audits

Alexey · August 23, 2024, 6:11am

You can solve the problem with failing audits because the system is overloaded by the external process like a scrub job or a migration: just stop the node. And run it periodically to do not have a too low online score, until you finish this high load maintenance.

basically - yes. We would be happy if you can keep it online. But if you couldn’t it’s better to be offline than slow. We want to have a great customers experience: the offline node will be excluded from any node selection.

Then maybe they can notice a yellow audit score? And would try to figure out what’s going on. Then they will notice a scary audit errors, and I hope that they would come here and ask.
When I would have a clear picture and several confirmations, that this change is affecting a noticeable amount of system, we may reconsider that. Otherwise I would add a FAQ article here and/or on the support portal and/or the docs portal.

You may search for the same PieceID to see, how many times it’s failed.

The retry wouldn’t come immediately.
However, maybe there is a change in the containment mode too. Because accordingly design-docs/20190909-auditing-and-node-containment.md at ed8bfe8d4c66a587f5322237f06208f74e301b7a · storj/design-docs · GitHub there were no repairs involved.

I do not think so. If you did change it to a higher value from defaults, that’s mean that the disk on that node cannot keep up, and reducing it likely would crash the node even more often than before. This check performs every 1m0s by default, I do not think that your node is audited with the same frequency. By the way, you also need to increase the readable check interval on the same amount too, otherwise it doesn’t make sense - these checks will overlap, if it’s truly respond after 2 minutes only.
Since it’s not disqualified, I can assume that the percentage of failed audits is still low.