4 TB node has been full the past few months, still only 2.17 TBm storage paid

on some of my nodes i have big difference between “used” and “average”.
filesystem is ext4, “du --si -s” shows about equal to “used” size.

Hello @Dozoris,
Welcome to the forum!

Please check a filesystem for errors and fix them.
Please check databases for errors:

i have tried and everything is OK. it has been several months. first, i thought it was from filewalker turned off, but it is turned on for about a month, nothing changed.

Then please search for “retention” in your logs

I have reduced logs to 10MB, so only less than a day of logs stored now. I will record logs separately and let you know when “retention” will come up. Thank you.

Sorry, I was wrong, it’s retain

Thank you for information. Hope it won’t take too long.

Not to come across like an ass but it sounds like quite a few of us are being short changed by having our payout values reported as less than what they actually are. That’s not so great to find out but at least I’m not the only one it is happening to.

2 Likes

so far 2 retain logs with lazyfilewalker on satellites US1 and EU1:

2023-06-24T21:03:05.631Z        INFO    retain  Prepared to run a Retain request.       {"process": "storagenode", "Created Before": "2023-06-20T11:59:59.942Z", "Filter Size": 3006, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2023-06-24T21:03:05.631Z        INFO    lazyfilewalker.gc-filewalker    starting subprocess     {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2023-06-24T21:03:05.632Z        INFO    lazyfilewalker.gc-filewalker    subprocess started      {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2023-06-24T21:03:05.857Z        INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode"}
2023-06-24T21:03:05.857Z        INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "createdBefore": "2023-06-20T11:59:59.942Z", "bloomFilterSize": 3006, "process": "storagenode"}
2023-06-24T21:35:49.324Z        INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode", "piecesCount": 3258248, "piecesSkippedCount": 0}
2023-06-24T21:35:49.327Z        INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2023-06-24T21:35:49.327Z        INFO    retain  Moved pieces to trash during retain     {"process": "storagenode", "num deleted": 0, "Retain Status": "enabled"}
2023-06-27T06:08:26.388Z        INFO    retain  Prepared to run a Retain request.       {"process": "storagenode", "Created Before": "2023-06-21T11:59:59.989Z", "Filter Size": 2544, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-06-27T06:08:26.388Z        INFO    lazyfilewalker.gc-filewalker    starting subprocess     {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-06-27T06:08:26.389Z        INFO    lazyfilewalker.gc-filewalker    subprocess started      {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-06-27T06:08:26.467Z        INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode"}
2023-06-27T06:08:26.467Z        INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode", "createdBefore": "2023-06-21T11:59:59.989Z", "bloomFilterSize": 2544}
2023-06-27T06:21:01.440Z        INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode", "piecesCount": 10161776, "piecesSkippedCount": 0}
2023-06-27T06:21:01.445Z        INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-06-27T06:21:01.445Z        INFO    retain  Moved pieces to trash during retain     {"process": "storagenode", "num deleted": 0, "Retain Status": "enabled"}

Still investigating farther.
earnings.py with DEBUG uncommented:

du --si -s :
103G 6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa
373G qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa
1.6G arej6usf33ki2kukzd5v6xgry2tdr56g45pp3aao6llsaaaaaaaa
1.9T ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa
119G pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa
2.7T v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa

so far 5 “retains” in 6 days:

2023-06-24T21:39:38.479Z        INFO    retain  Prepared to run a Retain request.       {"process": "storagenode", "Created Before": "2023-06-20T11:59:59.942Z", "Filter Size": 3065, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2023-06-24T21:39:38.479Z        INFO    lazyfilewalker.gc-filewalker    starting subprocess     {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2023-06-24T21:39:38.480Z        INFO    lazyfilewalker.gc-filewalker    subprocess started      {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2023-06-24T21:39:38.599Z        INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode"}
2023-06-24T21:39:38.600Z        INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "createdBefore": "2023-06-20T11:59:59.942Z", "bloomFilterSize": 3065, "process": "storagenode"}
2023-06-24T21:46:18.071Z        INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode", "piecesCount": 3163488, "piecesSkippedCount": 0}
2023-06-24T21:46:18.077Z        INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2023-06-24T21:46:18.077Z        INFO    retain  Moved pieces to trash during retain     {"process": "storagenode", "num deleted": 0, "Retain Status": "enabled"}
2023-06-27T05:39:23.445Z        INFO    retain  Prepared to run a Retain request.       {"process": "storagenode", "Created Before": "2023-06-21T11:59:59.989Z", "Filter Size": 1781, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-06-27T05:39:23.445Z        INFO    lazyfilewalker.gc-filewalker    starting subprocess     {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-06-27T05:39:23.445Z        INFO    lazyfilewalker.gc-filewalker    subprocess started      {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-06-27T05:39:23.542Z        INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode"}
2023-06-27T05:39:23.542Z        INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode", "createdBefore": "2023-06-21T11:59:59.989Z", "bloomFilterSize": 1781}
2023-06-27T05:54:02.191Z        INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode", "piecesCount": 9183327, "piecesSkippedCount": 0}
2023-06-27T05:54:02.197Z        INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-06-27T05:54:02.197Z        INFO    retain  Moved pieces to trash during retain     {"process": "storagenode", "num deleted": 0, "Retain Status": "enabled"}
2023-06-28T19:12:56.555Z        INFO    retain  Prepared to run a Retain request.       {"process": "storagenode", "Created Before": "2023-01-22T20:31:29.101Z", "Filter Size": 152, "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-06-28T19:12:56.555Z        INFO    lazyfilewalker.gc-filewalker    starting subprocess     {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-06-28T19:12:56.556Z        INFO    lazyfilewalker.gc-filewalker    subprocess started      {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-06-28T19:12:56.657Z        INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "process": "storagenode"}
2023-06-28T19:12:56.657Z        INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "process": "storagenode", "createdBefore": "2023-01-22T20:31:29.101Z", "bloomFilterSize": 152}
2023-06-28T19:12:59.374Z        INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "process": "storagenode", "piecesCount": 93265, "piecesSkippedCount": 0}
2023-06-28T19:12:59.378Z        INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-06-28T19:12:59.378Z        INFO    retain  Moved pieces to trash during retain     {"process": "storagenode", "num deleted": 0, "Retain Status": "enabled"}
2023-06-28T22:09:06.413Z        INFO    retain  Prepared to run a Retain request.       {"process": "storagenode", "Created Before": "2023-06-21T14:57:18.932Z", "Filter Size": 7, "Satellite ID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo"}
2023-06-28T22:09:06.413Z        INFO    lazyfilewalker.gc-filewalker    starting subprocess     {"process": "storagenode", "satelliteID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo"}
2023-06-28T22:09:06.413Z        INFO    lazyfilewalker.gc-filewalker    subprocess started      {"process": "storagenode", "satelliteID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo"}
2023-06-28T22:09:06.432Z        INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"process": "storagenode", "satelliteID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo", "process": "storagenode"}
2023-06-28T22:09:06.432Z        INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"process": "storagenode", "satelliteID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo", "process": "storagenode", "createdBefore": "2023-06-21T14:57:18.932Z", "bloomFilterSize": 7}
2023-06-28T22:09:07.079Z        INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"process": "storagenode", "satelliteID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo", "piecesSkippedCount": 0, "process": "storagenode", "piecesCount": 14887}
2023-06-28T22:09:07.082Z        INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"process": "storagenode", "satelliteID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo"}
2023-06-28T22:09:07.082Z        INFO    retain  Moved pieces to trash during retain     {"process": "storagenode", "num deleted": 0, "Retain Status": "enabled"}
2023-06-29T06:20:03.454Z        INFO    retain  Prepared to run a Retain request.       {"process": "storagenode", "Created Before": "2023-01-22T20:31:29.101Z", "Filter Size": 152, "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-06-29T06:20:03.455Z        INFO    lazyfilewalker.gc-filewalker    starting subprocess     {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-06-29T06:20:03.455Z        INFO    lazyfilewalker.gc-filewalker    subprocess started      {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-06-29T06:20:03.509Z        INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "process": "storagenode"}
2023-06-29T06:20:03.509Z        INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "process": "storagenode", "createdBefore": "2023-01-22T20:31:29.101Z", "bloomFilterSize": 152}
2023-06-29T06:20:06.287Z        INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "process": "storagenode", "piecesCount": 93694, "piecesSkippedCount": 0}
2023-06-29T06:20:06.291Z        INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-06-29T06:20:06.291Z        INFO    retain  Moved pieces to trash during retain     {"process": "storagenode", "num deleted": 0, "Retain Status": "enabled"}

Perhaps it need to come through several bloom filters, but the difference is noticeable.

Thanks everyone for the log lines. They really helped in figuring out what might be going on.

So, garbage collection works by the satellite generating a bloom filter and sending it to the nodes. The nodes then walk every piece and delete any that aren’t part of the filter. Bloom filters are nice because they can be much much smaller than the full set of valid pieces but come with the potential for false positives. In order to limit the false positive rate, the filters are sized based on how many entries are expected to be added. If the size is too small, the false positive rate will approach 100% and no pieces will be deleted. Recently, we had a bug where the satellite doesn’t estimate the number of pieces on the node well and so it sizes the bloom filter too small. That bug should be fixed by this change which is not yet deployed, but should be soon.

I’ll try to keep an eye on the deployment and update this thread when it happens.

12 Likes

I found that it will be in 1.83 only.

I don’t really understand how normal customer deletes could have been delegated to garbage collection while this issue has not been resolved. Looking at my nodes, I already have many TB of garbage on my HDDs now and this is only growing since even normal customer delete operations no longer lead to a regular piece deletion on nodes since the recent change.
I encourage you to keep better track of dependencies. I don’t think that change should have gone live prior to resolving this.
That said, I hope to see this fix go live soon.

Edit: Referring to this: Design draft: fixing deletes and server side copy

2 Likes

So, the change to turn off GC went out on 6/26, and we only put together that this problem was happening because of GC on 6/30. Unfortunate!

That said, I didn’t think the piece count problem was very widespread among nodes, but looking at the database it does seem to be a large fraction. I should have done the diligence to check that earlier. Thanks for bringing it up.

We’re current brainstorming some ideas for stop-gap measures before the deploy, and I’ll update if we end up with something.

update: One time update of node piece counts · Issue #6025 · storj/storj · GitHub is tracking the temporary fix before the permanent fix is deployed.

4 Likes

Apologies. I noticed the issue linked in the change you posted was from back in April and thought it was a known issue since then. I understand now that this resulting impact wasn’t known until much later.

Please keep in mind that last time large GC runs were triggered, many nodes had a lot of trouble keeping up due to the high IO load. I fear this time they may be even bigger.

2 Likes

Since the last update it seems it’s starting to correct, no ingress, but the Average disk Space is growing. No thrashing though.
I’ll report if this trend continues.

The last day reported on that graph is unfortunately inaccurate. I haven’t seen any new GC runs yet, so I doubt anything has actually changed.

ill keep track of it the upcoming days

I hit a good GC:

2023-07-07T23:43:22.982Z INFO retain Prepared to run a Retain request. {“process”: “storagenode”, “Created Before”: “2023-07-04T15:00:09.006Z”, “Filter Size”: 597951, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
2023-07-07T23:47:38.876Z INFO retain Moved pieces to trash during retain {“process”: “storagenode”, “num deleted”: 243206, “Retain Status”: “enabled”}

Hope i hit a few more of those.

The higher average of yesterday didn’t hold though. As BrightSilence already mentioned.

graph: