Audit scores dropping on ap1; Storj is working on a fix, SNO's don't need to do anything. ERROR piecedeleter could not send delete piece to trash >> Pieces error: v0pieceinfodb: sql: no rows in result set

i tried to track the piece but like i said, the logs are pretty full of holes after yesterdays overload since they would use more than 1MB pr 10 minutes and that was the limit i had set in docker.

and the logs only go back until the 16th…

tried to look for additional info on the piece, but drew a blank…
and to be fair how do i separate us2 from ap1 when its the same storagenode, and both sats saw audit score drops in the same time period after a long pause of over 12 hours, only weird thing about it is that ap1 didn’t show up in the log… even tho i looked through all GET_AUDIT
and even scanning the log with successrate.sh it still didn’t find a audit failure even tho the dashboard clearly shows it…

sure maybe this is different issues… but they are still so close to each other than i couldn’t even separate the two topics if i wanted to since the node dashboard clearly shows both issues then.

but whatever it doesn’t seem like my storagenodes are in danger of getting DQ presently…
so ill just keep an eye on it and let the pro’s deal with it lol

There is a satellite ID in each log line, so that should give you a hint on which issue might be causing them. There are 2 known and separate issues. Everyone who has been able to provide historic logs for relevant pieces has confirmed that these issues are isolated to their respective satellites. It doesn’t help anyone to speculate based on what you admit is 0 evidence.

Issue on us2: This satellite is infrequently auditing pieces that have expired a while ago. Post things related to that issue in the corresponding topic: Audit weirdness (us2 satellite). (Team aware. No additional information is needed)

Issue on ap1: Deletes are for some reason not finished correctly leading to satellites resending the deletes and still auditing and repairing the respective pieces. That’s what the topic you’re in is about.

By now plenty of logs with the required context have been posted. Unless you have new findings, there is no need to post more logs. There is especially no need to post logs with even less context and speculate that things that aren’t related might be related. And it only leads to confusion to post logs that are related to the other issue or completely unrelated to both.

If you do believe that you are experiencing one of these issues on satellites other than the respective ones mentioned above, please provide logs with the full history of that piece that show that it is indeed the same issue. If you can’t provide the full piece history, there is no use in posting just the failed audit. Without more information, nobody can tell you why you failed that audit.

Luckily the Storj Labs team is aware and working on both issues and have already said that SNOs don’t need to take any action and any negative effects related to the issues will be reverted from their end. So just sit back and give them time to implement the fixes. And keep in mind that it is the weekend at the moment, so it might be a bit before everything is fixed and all symptoms are gone. I’m certain they will report back when there is more news on this.

Bottom line: No data was ever at risk and no storage nodes are at risk.

2 Likes

err eu1 sat are also causing failed audits…
https://forum.storj.io/t/audit-falling-on-eu1/

sure they may look like separate issues, but it’s going to take a lot to convince me that randomly at the same day 3 different satellites develops a near identical issue… after running flawless on audits afaik for 16 + months.

you can say it’s individual issues all you want, but will most likely take a miracle to make me believe it lol

not that i don’t want to… i just can’t, it seems so unlikely…

They are different issues. That’s the point. The timing is irrelevant here (more like that you checked your logs only now, but the problem was here for a while).
I know what exactly provoke the errors on us2 and what exactly provoke it on ap1. These actions are independent on nodes, it’s a bugs hunting.
eu1 doesn’t have these two separate actions, so it must be something third.

Please, post in the Audit falling on eu1 your findings (I need to have a full history for the Piece ID sample).

3 Likes

one interesting thing is that the nodes i got that is 5 months old all seem totally unaffected by these issues… ofc they have less data, so could simply be down to statistics… but seemed worth a mention.

and by unaffected i mean they show no audit failures what so ever.

No further updates from storj on the issue?

I wouldn’t have expected any updates during the weekend.

1 Like

Update
They are back with a vengeance this time.

The vengeance started exactly at 07:00 UTC, 30K errors so far.

Yep they are back back back

It’s not weekend any more.

We are not there yet :slight_smile:
I will update the thread when the problem would be solved.
No actions are needed.

4 Likes

The issue should be resolved now.

13 Likes

Thanks for the awesome work on a quick resolution! Everything looks fine on my end now.
My curiosity is peaked, If anyone has the time I’d love to hear a bit about what happened that caused this?

5 Likes

We removed consequences. The root issue is not solved yet.

4 Likes

no issues for approximately over 36 hours, ill say thats a win…

1 Like

Is the root cause known @Alexey ?

The investigation in progress, I have no updates so far.
I can only say that in normal circumstances it’s not reproduced.

1 Like

Thank you for the update.

Thanks for the update again. If we can help by providing more info or logs or anything else on our end, please don’t hesitate to ask!