Failed Audit from us1

RecklessD · May 12, 2025, 11:39pm

Hi, One of my nodes, over the last 4 weeks, it has failed 8x Audit requests. Node is running on default backend - piecestore. Am I missing data somehow?

Each of the failed Audit looks like:

2025-05-12T18:06:07Z ERROR piecestore download failed {“Process”: “storagenode”, “Piece ID”: “BOOPW75ZPRAFWXIMI5ABY4TSBTMHTHZLRTTHJLIRPT7VSUCOMNJA”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET_AUDIT”, “Offset”: 1762048, “Size”: 256, “Remote Address”: “35.188.235.2:15964”, “error”: “hashstore: file does not exist”, “errorVerbose”: “hashstore: file does not exist\n\tstorj.io/storj/storagenode/hashstore.(*DB).Read:359\n\tstorj.io/storj/storagenode/piecestore.(*HashStoreBackend).Reader:298\n\tstorj.io/storj/storagenode/piecestore.(*MigratingBackend).Reader:180\n\tstorj.io/storj/storagenode/piecestore.(*TestingBackend).Reader:105\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:676\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:302\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:62\n\tstorj.io/common/experiment.(*Handler).HandleRPC:43\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:166\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:108\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:156\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}

Regards

Roxor · May 12, 2025, 11:57pm

Are you seeing the audit % drop in the UI… or is this only something you see in your logs? You could chkdsk/fsck/scrub to see if there are filesystem errors… but if I didn’t see my audit number dropping I’d probably just ignore it.

Maybe audits are coming in: and both the piecestore and hashstore backends are being triggered… and since you’re not using hashstore it’s just failing but not affecting anything?

RecklessD · May 13, 2025, 1:05am

It drops the audit % by 0.01% (huge!!!). Back to 100% in a couple of days. So, no dramatic effect.

Routine scrub ran last night, and detected no errors.’

arrogantrabbit · May 13, 2025, 2:09am

I have observed this when node would delay/drop repair requests.

nerdatwork · May 13, 2025, 3:08am

Specifically GET_REPAIR. In short, failing GET_AUDIT and GET_REPAIR will affect audit scores.

Alexey · May 13, 2025, 4:10am

Could you please search your logs for this piece to see a history with it?

RecklessD · May 13, 2025, 6:06am

My logs go back 6 weeks, no other reference to that Piece ID.

Alexey · May 13, 2025, 7:20am

Interesting. Perhaps it was a result of moving this piece to the trash and the sent recover command didn’t restore it (because of too late).
I do not remember, is PieceID logged, when it’s moved to the trash on info level?

Or there was a real piece lost, hashstore checked after piecestore.

I have a one more idea - could you please search one of the piece on your disk in the piecestore?
See how to convert a PieceID to the file:

RecklessD · May 14, 2025, 5:25am

Been through all five failed audit errors, no other reference to the respective peice id in the logs or file system.

Alexey · May 17, 2025, 5:02am

Then these pieces are actually lost. However, a little bit concerning the fact that you have no evidence, that these pieces were ever uploaded to your node.

RecklessD · May 17, 2025, 5:48am

Concerning that I have missing data. Node is 15 months old, and I only keep six weeks of logs, not surprising I don’t have any other record of missing files.
. What surprised me, was that it’s looking in Hashstore for data, but that’s not enabled.

Alexey · May 17, 2025, 7:42am

It’s checked as a second source of truth, so no wondering.