on some of my nodes i have big difference between “used” and “average”.
filesystem is ext4, “du --si -s” shows about equal to “used” size.
Hello @Dozoris,
Welcome to the forum!
Please check a filesystem for errors and fix them.
Please check databases for errors:
i have tried and everything is OK. it has been several months. first, i thought it was from filewalker turned off, but it is turned on for about a month, nothing changed.
Then please search for “retention” in your logs
I have reduced logs to 10MB, so only less than a day of logs stored now. I will record logs separately and let you know when “retention” will come up. Thank you.
Sorry, I was wrong, it’s retain
Thank you for information. Hope it won’t take too long.
Not to come across like an ass but it sounds like quite a few of us are being short changed by having our payout values reported as less than what they actually are. That’s not so great to find out but at least I’m not the only one it is happening to.
so far 2 retain logs with lazyfilewalker on satellites US1 and EU1:
2023-06-24T21:03:05.631Z INFO retain Prepared to run a Retain request. {"process": "storagenode", "Created Before": "2023-06-20T11:59:59.942Z", "Filter Size": 3006, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2023-06-24T21:03:05.631Z INFO lazyfilewalker.gc-filewalker starting subprocess {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2023-06-24T21:03:05.632Z INFO lazyfilewalker.gc-filewalker subprocess started {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2023-06-24T21:03:05.857Z INFO lazyfilewalker.gc-filewalker.subprocess Database started {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode"}
2023-06-24T21:03:05.857Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker started {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "createdBefore": "2023-06-20T11:59:59.942Z", "bloomFilterSize": 3006, "process": "storagenode"}
2023-06-24T21:35:49.324Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode", "piecesCount": 3258248, "piecesSkippedCount": 0}
2023-06-24T21:35:49.327Z INFO lazyfilewalker.gc-filewalker subprocess finished successfully {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2023-06-24T21:35:49.327Z INFO retain Moved pieces to trash during retain {"process": "storagenode", "num deleted": 0, "Retain Status": "enabled"}
2023-06-27T06:08:26.388Z INFO retain Prepared to run a Retain request. {"process": "storagenode", "Created Before": "2023-06-21T11:59:59.989Z", "Filter Size": 2544, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-06-27T06:08:26.388Z INFO lazyfilewalker.gc-filewalker starting subprocess {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-06-27T06:08:26.389Z INFO lazyfilewalker.gc-filewalker subprocess started {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-06-27T06:08:26.467Z INFO lazyfilewalker.gc-filewalker.subprocess Database started {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode"}
2023-06-27T06:08:26.467Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker started {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode", "createdBefore": "2023-06-21T11:59:59.989Z", "bloomFilterSize": 2544}
2023-06-27T06:21:01.440Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode", "piecesCount": 10161776, "piecesSkippedCount": 0}
2023-06-27T06:21:01.445Z INFO lazyfilewalker.gc-filewalker subprocess finished successfully {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-06-27T06:21:01.445Z INFO retain Moved pieces to trash during retain {"process": "storagenode", "num deleted": 0, "Retain Status": "enabled"}
Still investigating farther.
earnings.py with DEBUG uncommented:
du --si -s :
103G 6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa
373G qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa
1.6G arej6usf33ki2kukzd5v6xgry2tdr56g45pp3aao6llsaaaaaaaa
1.9T ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa
119G pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa
2.7T v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa
so far 5 “retains” in 6 days:
2023-06-24T21:39:38.479Z INFO retain Prepared to run a Retain request. {"process": "storagenode", "Created Before": "2023-06-20T11:59:59.942Z", "Filter Size": 3065, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2023-06-24T21:39:38.479Z INFO lazyfilewalker.gc-filewalker starting subprocess {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2023-06-24T21:39:38.480Z INFO lazyfilewalker.gc-filewalker subprocess started {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2023-06-24T21:39:38.599Z INFO lazyfilewalker.gc-filewalker.subprocess Database started {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode"}
2023-06-24T21:39:38.600Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker started {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "createdBefore": "2023-06-20T11:59:59.942Z", "bloomFilterSize": 3065, "process": "storagenode"}
2023-06-24T21:46:18.071Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode", "piecesCount": 3163488, "piecesSkippedCount": 0}
2023-06-24T21:46:18.077Z INFO lazyfilewalker.gc-filewalker subprocess finished successfully {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2023-06-24T21:46:18.077Z INFO retain Moved pieces to trash during retain {"process": "storagenode", "num deleted": 0, "Retain Status": "enabled"}
2023-06-27T05:39:23.445Z INFO retain Prepared to run a Retain request. {"process": "storagenode", "Created Before": "2023-06-21T11:59:59.989Z", "Filter Size": 1781, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-06-27T05:39:23.445Z INFO lazyfilewalker.gc-filewalker starting subprocess {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-06-27T05:39:23.445Z INFO lazyfilewalker.gc-filewalker subprocess started {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-06-27T05:39:23.542Z INFO lazyfilewalker.gc-filewalker.subprocess Database started {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode"}
2023-06-27T05:39:23.542Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker started {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode", "createdBefore": "2023-06-21T11:59:59.989Z", "bloomFilterSize": 1781}
2023-06-27T05:54:02.191Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode", "piecesCount": 9183327, "piecesSkippedCount": 0}
2023-06-27T05:54:02.197Z INFO lazyfilewalker.gc-filewalker subprocess finished successfully {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-06-27T05:54:02.197Z INFO retain Moved pieces to trash during retain {"process": "storagenode", "num deleted": 0, "Retain Status": "enabled"}
2023-06-28T19:12:56.555Z INFO retain Prepared to run a Retain request. {"process": "storagenode", "Created Before": "2023-01-22T20:31:29.101Z", "Filter Size": 152, "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-06-28T19:12:56.555Z INFO lazyfilewalker.gc-filewalker starting subprocess {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-06-28T19:12:56.556Z INFO lazyfilewalker.gc-filewalker subprocess started {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-06-28T19:12:56.657Z INFO lazyfilewalker.gc-filewalker.subprocess Database started {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "process": "storagenode"}
2023-06-28T19:12:56.657Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker started {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "process": "storagenode", "createdBefore": "2023-01-22T20:31:29.101Z", "bloomFilterSize": 152}
2023-06-28T19:12:59.374Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "process": "storagenode", "piecesCount": 93265, "piecesSkippedCount": 0}
2023-06-28T19:12:59.378Z INFO lazyfilewalker.gc-filewalker subprocess finished successfully {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-06-28T19:12:59.378Z INFO retain Moved pieces to trash during retain {"process": "storagenode", "num deleted": 0, "Retain Status": "enabled"}
2023-06-28T22:09:06.413Z INFO retain Prepared to run a Retain request. {"process": "storagenode", "Created Before": "2023-06-21T14:57:18.932Z", "Filter Size": 7, "Satellite ID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo"}
2023-06-28T22:09:06.413Z INFO lazyfilewalker.gc-filewalker starting subprocess {"process": "storagenode", "satelliteID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo"}
2023-06-28T22:09:06.413Z INFO lazyfilewalker.gc-filewalker subprocess started {"process": "storagenode", "satelliteID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo"}
2023-06-28T22:09:06.432Z INFO lazyfilewalker.gc-filewalker.subprocess Database started {"process": "storagenode", "satelliteID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo", "process": "storagenode"}
2023-06-28T22:09:06.432Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker started {"process": "storagenode", "satelliteID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo", "process": "storagenode", "createdBefore": "2023-06-21T14:57:18.932Z", "bloomFilterSize": 7}
2023-06-28T22:09:07.079Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"process": "storagenode", "satelliteID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo", "piecesSkippedCount": 0, "process": "storagenode", "piecesCount": 14887}
2023-06-28T22:09:07.082Z INFO lazyfilewalker.gc-filewalker subprocess finished successfully {"process": "storagenode", "satelliteID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo"}
2023-06-28T22:09:07.082Z INFO retain Moved pieces to trash during retain {"process": "storagenode", "num deleted": 0, "Retain Status": "enabled"}
2023-06-29T06:20:03.454Z INFO retain Prepared to run a Retain request. {"process": "storagenode", "Created Before": "2023-01-22T20:31:29.101Z", "Filter Size": 152, "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-06-29T06:20:03.455Z INFO lazyfilewalker.gc-filewalker starting subprocess {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-06-29T06:20:03.455Z INFO lazyfilewalker.gc-filewalker subprocess started {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-06-29T06:20:03.509Z INFO lazyfilewalker.gc-filewalker.subprocess Database started {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "process": "storagenode"}
2023-06-29T06:20:03.509Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker started {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "process": "storagenode", "createdBefore": "2023-01-22T20:31:29.101Z", "bloomFilterSize": 152}
2023-06-29T06:20:06.287Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "process": "storagenode", "piecesCount": 93694, "piecesSkippedCount": 0}
2023-06-29T06:20:06.291Z INFO lazyfilewalker.gc-filewalker subprocess finished successfully {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-06-29T06:20:06.291Z INFO retain Moved pieces to trash during retain {"process": "storagenode", "num deleted": 0, "Retain Status": "enabled"}
Thanks everyone for the log lines. They really helped in figuring out what might be going on.
So, garbage collection works by the satellite generating a bloom filter and sending it to the nodes. The nodes then walk every piece and delete any that aren’t part of the filter. Bloom filters are nice because they can be much much smaller than the full set of valid pieces but come with the potential for false positives. In order to limit the false positive rate, the filters are sized based on how many entries are expected to be added. If the size is too small, the false positive rate will approach 100% and no pieces will be deleted. Recently, we had a bug where the satellite doesn’t estimate the number of pieces on the node well and so it sizes the bloom filter too small. That bug should be fixed by this change which is not yet deployed, but should be soon.
I’ll try to keep an eye on the deployment and update this thread when it happens.
I found that it will be in 1.83 only.
I don’t really understand how normal customer deletes could have been delegated to garbage collection while this issue has not been resolved. Looking at my nodes, I already have many TB of garbage on my HDDs now and this is only growing since even normal customer delete operations no longer lead to a regular piece deletion on nodes since the recent change.
I encourage you to keep better track of dependencies. I don’t think that change should have gone live prior to resolving this.
That said, I hope to see this fix go live soon.
Edit: Referring to this: Design draft: fixing deletes and server side copy
So, the change to turn off GC went out on 6/26, and we only put together that this problem was happening because of GC on 6/30. Unfortunate!
That said, I didn’t think the piece count problem was very widespread among nodes, but looking at the database it does seem to be a large fraction. I should have done the diligence to check that earlier. Thanks for bringing it up.
We’re current brainstorming some ideas for stop-gap measures before the deploy, and I’ll update if we end up with something.
update: One time update of node piece counts · Issue #6025 · storj/storj · GitHub is tracking the temporary fix before the permanent fix is deployed.
Apologies. I noticed the issue linked in the change you posted was from back in April and thought it was a known issue since then. I understand now that this resulting impact wasn’t known until much later.
Please keep in mind that last time large GC runs were triggered, many nodes had a lot of trouble keeping up due to the high IO load. I fear this time they may be even bigger.
Since the last update it seems it’s starting to correct, no ingress, but the Average disk Space is growing. No thrashing though.
I’ll report if this trend continues.
The last day reported on that graph is unfortunately inaccurate. I haven’t seen any new GC runs yet, so I doubt anything has actually changed.
ill keep track of it the upcoming days
I hit a good GC:
2023-07-07T23:43:22.982Z INFO retain Prepared to run a Retain request. {“process”: “storagenode”, “Created Before”: “2023-07-04T15:00:09.006Z”, “Filter Size”: 597951, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
2023-07-07T23:47:38.876Z INFO retain Moved pieces to trash during retain {“process”: “storagenode”, “num deleted”: 243206, “Retain Status”: “enabled”}
Hope i hit a few more of those.
The higher average of yesterday didn’t hold though. As BrightSilence already mentioned.
graph: