Audit scores dropping on ap1; Storj is working on a fix, SNO's don't need to do anything. ERROR piecedeleter could not send delete piece to trash >> Pieces error: v0pieceinfodb: sql: no rows in result set

I have the same for ap1 sattelite now

1 Like

Yep - AP1 hit for me too.

The AP1 failures are related to a different issue.

I seem to be failing audits on ap1.storj.io as it’s auditing pieces it’s already deleted.
This is a satellite issue not a storagenode issue.

1 Like

seems so yes… i’m not really to familiar with the storj network mechanics… but brightsilence suggested the same thing, it’s a satellite issue.

i’m going to shutdown my nodes if the satellites audits start hitting 70% or so…
to avoid DQ

1 Like

I’m not seeing anything hit my dashboard yet, it’s still all 100%, but there are errors in my logs. My newest node seems hit the hardest.

1 of these:
021-07-23T00:30:03.080Z ERROR piecestore download failed {“Piece ID”: “M4JJX762SFXPUQGEMQWSIDRJHDN7KDCEW3X5ABXFGRY7ZEOJTDKQ”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET_AUDIT”, “error”: “file does not exist”

10 of these, same sat:
2021-07-23T00:30:03.080Z ERROR piecestore download failed {“Piece ID”: “M4JJX762SFXPUQGEMQWSIDRJHDN7KDCEW3X5ABXFGRY7ZEOJTDKQ”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET_AUDIT”, “error”: “file does not exist”

80 of these, mostly same sat:
Piece ID": “WXMCZYCAYEWVSOVWUL3L4NK26WL47OSE6UZ6LJFK6OIJJ3R4N4KA”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “PUT”, “error”: “unexpected EOF”

Seems to be another satelite giving bad audits as well.
Found checked wrong log.

2021-07-22T23:42:22.240Z ERROR piecestore download failed {“Piece ID”: “GJLVEDSUDLETGCTXQCG2LKLJUEWAN2LOKSRUSMOVZRMUA7PQSWLQ”, “Satellite ID”: “12 1RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET_AUDIT”, “erro r”: “file does not exist”, “errorVerbose”: “file does not exist\n\tstorj.io/comm on/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Do wnload:534\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:217\n\ts torj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Han dler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:102\n\tstorj. io/drpc/drpcserver.(*Server).ServeOne:60\n\tstorj.io/drpc/drpcserver.(*Server).S erve.func2:95\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51”}

image

@deathlessdd @KernelPanick @SGC @waistcoat @Ted @LinuxNet
Please, search for all records from your logs with this piece.
Here is a problem only with delete expired and then download failed. If your node lost pieces for other reasons - it’s not related to the current problem with us2 satellite.

Unfortunately I dont have logs far back cause it recently updated, But if more and more people see the same issues its not cause my node lost a file.
But my node is still failing audits.

2021-07-22T23:53:46.965Z        ERROR   piecestore      download failed {"Piece ID": "ORX5MM6SCZOUJ5HPTU7QRDON34EKYLAJOUUUO4HC4WZ2HEKPEX6Q", "Satellite ID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo", "Action": "GET_AUDIT", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:534\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:217\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:102\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:60\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:95\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2021-07-23T10:28:23.241Z        ERROR   piecestore      download failed {"Piece ID": "RAYFOF5VE2LYJQBGXJ443VT3L3C7RLE2BDQCEJAXSL2VJJAO5OCA", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET_AUDIT", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:534\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:217\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:102\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:60\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:95\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}

image

this is wide spread, tho doesn’t seem to affect all nodes yet…
there are no issues with my setup, i could barely make my system loose a piece if i tried…

only got logs from the 16th sadly … had some issues with my OS and i mostly just dumped them… because they was using an older logging method.

i also seen another 3 failed audits in the last couple of hours.
started a scrub on the 21th when i noticed this issue, and it’s done now… i don’t have a trace of any data corruption, and haven’t had downtime nor issues.

i’m trying to get some useful log info, but i’m not to use to actually retrieving data from my logs… i really need to get loki installed… so mostly i’ve just been learning how inept i am at searching through all this data in an accurate way lol

Perhaps your situation with audits failures on AP1 satellite is related to Audit scores dropping on ap1; Storj is working on a fix, SNO's don't need to do anything. ERROR piecedeleter could not send delete piece to trash >> Pieces error: v0pieceinfodb: sql: no rows in result set - #17 by BrightSilence

I didn’t blame your setup, sorry if I somehow pushed you to the such thought.
I just was needed excerpts from logs to be sure that situation is not related to the known issues Audit weirdness (us2 satellite). (Team aware. No additional information is needed) and Audit scores dropping on ap1; Storj is working on a fix, SNO's don't need to do anything. ERROR piecedeleter could not send delete piece to trash >> Pieces error: v0pieceinfodb: sql: no rows in result set - #17 by BrightSilence

1 Like

Never lost a file. Strangely, this has happened since yesterday and since there are several SNOs with the problem, this time I do not accept that there is something with the node. All nodes have been running continuously for several months except for updates.

I’ll turn off my nodes then too. Node 3 has already dropped to 90%.

We don’t have any other choice if we don’t want to lose all these months of work.

You doesn’t need to shutdown nodes. If the audit drop affected by the known issues, the DQ would be restated, see

The check is easy enough - search for piece in logs to check are they deleted as expired or due error of moving to the trash?

1 Like

i tried to track the piece but like i said, the logs are pretty full of holes after yesterdays overload since they would use more than 1MB pr 10 minutes and that was the limit i had set in docker.

and the logs only go back until the 16th…

tried to look for additional info on the piece, but drew a blank…
and to be fair how do i separate us2 from ap1 when its the same storagenode, and both sats saw audit score drops in the same time period after a long pause of over 12 hours, only weird thing about it is that ap1 didn’t show up in the log… even tho i looked through all GET_AUDIT
and even scanning the log with successrate.sh it still didn’t find a audit failure even tho the dashboard clearly shows it…

sure maybe this is different issues… but they are still so close to each other than i couldn’t even separate the two topics if i wanted to since the node dashboard clearly shows both issues then.

but whatever it doesn’t seem like my storagenodes are in danger of getting DQ presently…
so ill just keep an eye on it and let the pro’s deal with it lol

There is a satellite ID in each log line, so that should give you a hint on which issue might be causing them. There are 2 known and separate issues. Everyone who has been able to provide historic logs for relevant pieces has confirmed that these issues are isolated to their respective satellites. It doesn’t help anyone to speculate based on what you admit is 0 evidence.

Issue on us2: This satellite is infrequently auditing pieces that have expired a while ago. Post things related to that issue in the corresponding topic: Audit weirdness (us2 satellite). (Team aware. No additional information is needed)

Issue on ap1: Deletes are for some reason not finished correctly leading to satellites resending the deletes and still auditing and repairing the respective pieces. That’s what the topic you’re in is about.

By now plenty of logs with the required context have been posted. Unless you have new findings, there is no need to post more logs. There is especially no need to post logs with even less context and speculate that things that aren’t related might be related. And it only leads to confusion to post logs that are related to the other issue or completely unrelated to both.

If you do believe that you are experiencing one of these issues on satellites other than the respective ones mentioned above, please provide logs with the full history of that piece that show that it is indeed the same issue. If you can’t provide the full piece history, there is no use in posting just the failed audit. Without more information, nobody can tell you why you failed that audit.

Luckily the Storj Labs team is aware and working on both issues and have already said that SNOs don’t need to take any action and any negative effects related to the issues will be reverted from their end. So just sit back and give them time to implement the fixes. And keep in mind that it is the weekend at the moment, so it might be a bit before everything is fixed and all symptoms are gone. I’m certain they will report back when there is more news on this.

Bottom line: No data was ever at risk and no storage nodes are at risk.

2 Likes

err eu1 sat are also causing failed audits…
https://forum.storj.io/t/audit-falling-on-eu1/

sure they may look like separate issues, but it’s going to take a lot to convince me that randomly at the same day 3 different satellites develops a near identical issue… after running flawless on audits afaik for 16 + months.

you can say it’s individual issues all you want, but will most likely take a miracle to make me believe it lol

not that i don’t want to… i just can’t, it seems so unlikely…

They are different issues. That’s the point. The timing is irrelevant here (more like that you checked your logs only now, but the problem was here for a while).
I know what exactly provoke the errors on us2 and what exactly provoke it on ap1. These actions are independent on nodes, it’s a bugs hunting.
eu1 doesn’t have these two separate actions, so it must be something third.

Please, post in the Audit falling on eu1 your findings (I need to have a full history for the Piece ID sample).

3 Likes

one interesting thing is that the nodes i got that is 5 months old all seem totally unaffected by these issues… ofc they have less data, so could simply be down to statistics… but seemed worth a mention.

and by unaffected i mean they show no audit failures what so ever.

No further updates from storj on the issue?