ERROR trashTotal

raert · November 2, 2024, 4:07pm

What is this? I’ve never seen this before.
It filled up the logs pretty badly (10 billion logs per second):

Summary

2024-11-02T16:02:45.009461+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -2319872} 2024-11-02T16:02:45.009471+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -1024} 2024-11-02T16:02:45.009484+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -4864} 2024-11-02T16:02:45.009493+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -38144} 2024-11-02T16:02:45.009505+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -5632} 2024-11-02T16:02:45.009516+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -1280} 2024-11-02T16:02:45.009529+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -2560} 2024-11-02T16:02:45.009539+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -182016} 2024-11-02T16:02:45.009579+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -4864} 2024-11-02T16:02:45.009592+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -4352} 2024-11-02T16:02:45.009605+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -1160448} 2024-11-02T16:02:45.009615+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -182016} 2024-11-02T16:02:45.009627+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -13824} 2024-11-02T16:02:45.009640+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -4864} 2024-11-02T16:02:45.009650+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -366336} 2024-11-02T16:02:45.009661+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -182016} 2024-11-02T16:02:45.009673+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -579840} 2024-11-02T16:02:45.009683+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -7936} 2024-11-02T16:02:45.009695+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -363264} 2024-11-02T16:02:45.009707+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -284672} 2024-11-02T16:02:45.009717+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -1536} 2024-11-02T16:02:45.009728+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -560128} 2024-11-02T16:02:45.009740+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -182016} 2024-11-02T16:02:45.009750+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -17920} 2024-11-02T16:02:45.009761+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -1024} 2024-11-02T16:02:45.009772+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -2048} 2024-11-02T16:02:45.009782+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -73472} 2024-11-02T16:02:45.009826+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -182016} 2024-11-02T16:02:45.009840+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -207872} 2024-11-02T16:02:45.009853+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -8704} 2024-11-02T16:02:45.009863+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -1536} 2024-11-02T16:02:45.009874+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -4864} 2024-11-02T16:02:45.009885+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -60160} 2024-11-02T16:02:45.009896+01:00 storj storagenode[3020522]: 2024-11-02T16:02:45+01:00#011ERROR#011blobscache#011trashTotal < 0#011{"Process": "storagenode", "trashTotal": -6144}

Alexey · November 3, 2024, 3:02am

No one of my nodes have these messages.
Could you please enable scan on startup and restart that node to update the database regarding trash usage.

Or, maybe it’s related to this:

But you node didn’t have these required pieces? If so, it smells like a disqualification.

Could you please also check your databases?

Alexey · November 3, 2024, 10:05am

No, it does mean that I cannot reproduce, thus - need more details or reports from other SNOs.
I think that maybe something wrong is happened with your node, we need to figure out what. Especially if that would be confirmed by other SNOs.

Right now I’m not convinced that there is a some bug.

Lets’ try to exclude usual things like a corrupted database or a filesystem, then we can proceed further.
Did you check all databases and disks? Also, do you have these errors only on one node?

xgDkAbzkp9yi · November 3, 2024, 10:14am

If a node software produces a error entry every 10 microseconds it’s not a bug to you?
Even if you can’t reproduce it on your own machine, this should NEVER happen on any machine.

Alexey · November 3, 2024, 10:27am

To me a bug should be visible on every installation of that binary.
If not, we need to figure out, what conditions to trigger it.
Do you have it by the way?
If not, is it still a bug in the binary for you?

Because we also may call a bug, that the node is offline, because the setup doesn’t have a public IP. See what am I mean?

We need to find a condition, which allows to reproduce it, otherwise it likely would never be fixed.

So I need more details about this setup and that all usual problems are excluded, like DB and FS corruptions, etc.

xgDkAbzkp9yi · November 3, 2024, 10:44am

So if a bug doesn’t apply to one setup from a million affected setups, it disqualifies the bug completely, and devs should not work on it at all? It’s nonsense. Try to sell this strategy to any dev team, I wonder what Microsoft would say about it. They, like any other software producers, every day have to fix niche bugs who affect only a fraction of their users, but they have to do all the work anyway.

I have not checked it yet, but I will look into the logs. If someone else (not me) is affected, then yes, it is a bug.

That would be a misconfiguration by the user, no?

Yes. But flooding the log is a bug, it should never happen, regardless of the cause. Even if user has misconfigured something, or data got corrupted due to hardware problem, storj should not flood the logs like that. That’s the first bug - log flooding.
Second bug can be confirmed (or rejected, if it’s a user problem) once we get to the bottom of this and determine the cause.

Alexey · November 3, 2024, 3:35pm

Please re-read my comment one more time - all setups should have a bug if it’s exist. Include yours and everyone else.
Please answer on the important question - do you have this issue? It would be much more helpful.

This is my current suspect.

I prefer to find a condition to reproduce it, otherwise it would take much more time to fix it.
If you would have the same issue in your logs, please check the databases too.
If the database is to blame, it’s need to be fixed.

Alexey · November 3, 2024, 3:38pm

Could you please check your databases and disks on errors?

xgDkAbzkp9yi · November 3, 2024, 4:50pm

I don’t need to re-read your comment, I disagree with it.

My experience is not important at all. Why? Because whatever I experience does not change anything for the OP, yes or no?
And I’ve checked docker logs, I am not affected by this issue.

Are you a developer? It seems to me that you are not, and you can’t understand a situation where not 100% computers are affected by a particular bug. This is quite common.

This is a first step for you to admit that storj software have a problem. We may have a bug to report to the devs, just need to gather as much info as possible before contacting them. It’s not a question if this is a bug or not, log flooding by definition is a problem , it should not happen, it is simple to understand.
And whatever caused the flooding needs to be investigated too, it could be another bug, or user misconfiguration.

dancekid · November 3, 2024, 5:16pm

@raert @xgDkAbzkp9yi

Please stop spamming with useless replies.

To analyze your error we need information on node setup… not 10 replies without technical conditions…

xgDkAbzkp9yi · November 3, 2024, 6:33pm

@raert is creator of this topic, how can he be “spamming with useless replies”?
And I am participating in this discussion as usual.

If you are not interested in debugging this issue, do not read it, and do not participate, no one forces you to. But please don’t dictate to others what they should do to resolve this problem.

Alexey · November 4, 2024, 3:30am

There are two kind of bugs: in the app and in the setup (the config + data). If it’s in the app, it will be on all setups in sleeping or an active state. The more common bug is in a setup.
To claim the bug, it need to be reproduced, otherwise the fix would not be implemented fast enough.
Thank you for confirming, that you do not have this issue too. So far it looks like related to the data or config of this particular node.

Unfortunately this cannot be classified as a bug per se. You may get the same behavior if you specify a wrong external address or port, your logs blows up with messages “ping satellite failed”.
The logs are used to identify the issue, silencing them is not a solution. OP can silence them by the way, either globally reducing the log level to none or partially using --log.custom-level
But I prefer to find a root cause instead.

Alexey · November 4, 2024, 3:34am

Yes, we would try that. But please lets fix the issue for you first.
Please, check the filesystem and the databases and we will continue to mitigate the issue, if everything is correct there.

I suspect that your databases may have had the wrong amount of trash used reported, and when the restore happened, trashTotal became negative.
Please confirm that there is no issues with the filesystem and databases.