Questions regarding audit and online

Hey everyone.

So I have a relatively new node online. I have had a few issues and taken my node down a few times, and my online score has gone down as I expected.

However I now have some satellites reporting 99.8% audit score.

What is the audit score? Am I missing pieces on my node? If the audit finds missing pieces are they repaired/replaced? I’ve tried reading a few posts here on the forum and online and I cant figure out why my audit score is going down. All my disks are healthy and everything is running normally. Is there a way to check why I am loosing audit score? Is there a way to bring my audit score back up?

Also read

2 Likes

If the audit score is affected, then you have either missing pieces or they are corrupted. It also can go down, if your node did not provide a piece for audit within 5 minutes and doing so 3 times for the same piece. For any of these cases your node must be online (response on audit request, but either cannot provide a piece, or it’s missing or corrupted). It’s not related to the online score. All scores are independent of each other.

So, I would suggest to use this article to troubleshoot why your audit score is affected:

Hey Alexy,

Thank you for this! I was able to go through my logs and it looks like I have 4 GET_AUDIT fails for “File Does not Exist”

I also have one failed GET_REPAIR with “File does not exist”

Is there a way to repair this? Is it possibly to bring my nodes audit score back up?

Unfortunately no. Missed pieces would be recovered only when the number of healthy pieces for the segment would fall below a repair threshold, but even then they will be recovered to other nodes but yours.

Ok.

Will I continue to be audited for these pieces I’ve already failed for? Or have I been marked as not having the piece?

It dosent seem very fair that the node operator is punished for what equates to a satellite error in my mind. My node completes the work, and even responds with “file does not exist”. This equates to a soft fail in my mind and is a bad metric to use for disqualification. I don’t have bad data or failing hardware.

This is far from a soft fail. Your node accepted the piece, confirmed that the piece was saved, was paid for the storage of that piece and then when requested to provide either proof of the piece (GET_AUDIT) or the actual piece (GET or GET_REPAIR) gave a ‘file does not exist error’. Sounds like an issue on your node.

3 Likes

How did you lose the piece? It Depends.

You said you have taken node down a few times due to issues. What issues, and more importantly, how did you stop the node? Did you stop the container or send a signal to a process and waited for it to finish,? Did you abruptly rebooted the hardware?
You could have lost a pice if it was accepted, but did not get a chance to be flushed to storage. E.g. power failure could result in lost pieces. Or maybe you have filesystem issues, and then you will likely discover more missing pieces.

When node is new, there are a few pieces stored, and therefore higher probability that audit stumbles on a missing one.

At this point I suggest find out the root of the problem, prevent it from happening again, and create a new node. Scrap this one, you don’t know the extent of damage.

4 Likes

Yes, until this piece either recovered to a different node or the customer delete it.
The satellite will audit a random pieces, but can audit the same missing piece again.

https://review.dev.storj.io/c/storj/storj/+/10583

so, it could be in the next release