When will "Uncollected Garbage" be deleted?

jammerdan · July 31, 2024, 8:21am

If that’s the satellite data you are pulling then it can’t be trusted. For example US-1 has several days not reporting or reporting garbage like this:

{"atRestTotal":0,"atRestTotalBytes":0,"intervalInHours":-17738274,"intervalStart":"2024-07-29T00:00:00Z"},

The good thing is, that means less garbage on your disk.

edo · July 31, 2024, 8:30am

Yes, the data I’m referencing is indeed from the SL satellite, and you’re right that not all data is consistently reported. However, I’m considering the overall trend over a period of time, rather than focusing on specific dates.

Even if my estimate of 4.4 TB isn’t exact, the key issue is that there’s still a significant amount of space occupied by data that hasn’t been deleted after TTL expiration. The Bloom Filters (BFs) are not clearing nearly as much data as the uncollected garbage suggests.

While my calculations may not be perfectly precise, it’s evident that there’s an issue with uncollected garbage not being fully cleaned up.

edo · July 31, 2024, 8:48am

I usually avoid tagging specific people directly, but since I haven’t received any feedback from the Storj team on my findings, I’d appreciate some insights. @Alexey, do you have any thoughts on this?

jammerdan · July 31, 2024, 9:04am

Interesting. But this can’t really be different than what the API displays. Have you checked that if there is a difference?

The problem is, we do have many many different issues and they all mix together and need to be solved at least before I can get a clear picture.

For example this issue:

github.com/storj/storj

Cancelled upload files aren't immediately deleted

opened 11:59PM - 18 Jul 24 UTC

zipiju

Bug

Noticed this on the forum, where an user was asked to create a bug report after …finding this bug, but I'm seeing none here, so creating one after was able to reproduce the issue. It looks like the node isn't actually deleting files from uploads that were cancelled at some point. This probably is a significant source of garbage and additional IOPS, especially in situations where the upload success rate tanks - either because the disk subsystem cannot keep up (as during the filewalks, and even with SSD cache in front of the platter drive), or when the circuit bandwidth is all used up. As per the logs, these example uploads were all cancelled, the files were however still created in the blobs folder: ``` # tail -f /var/log/storj/storj.log | grep cancel 2024-07-19T01:35:30+02:00 INFO piecestore upload canceled (race lost or node shutdown) {"Process": "storagenode", "Piece ID": "B4PJDU3TYYRUA2VJTRAPJ65T2Z3R6PZ76GVCJNCI6N4WU3B7MMKA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "109.61.92.82:39516"} 2024-07-19T01:35:30+02:00 INFO piecestore upload canceled (race lost or node shutdown) {"Process": "storagenode", "Piece ID": "UZ7YJRS3JNNBOK22N562O7SHNNXWLFJOF7IYVVGRACZTJONLG7TQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.213.34:38270"} 2024-07-19T01:35:30+02:00 INFO piecestore upload canceled (race lost or node shutdown) {"Process": "storagenode", "Piece ID": "EBSPEU6RN5765OKD3I2EBRW73FROL3MEKCHTSQQTIK2KV3YFSCUQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.213.33:33718"} 2024-07-19T01:35:31+02:00 INFO piecestore upload canceled (race lost or node shutdown) {"Process": "storagenode", "Piece ID": "RTPUVF3NFLXQU6E2WB4TMH5XYEPAPHKBPYQJBYQVF7YO2ENR2KSA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.205.228:50688"} # ls -lah /mnt/storj-data/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/b4/pjdu3tyyrua2vjtrapj65t2z3r6pz76gvcjnci6n4wu3b7mmka.sj1 -rw-r--r-- 1 storj storj 1.3K Jul 19 01:35 /mnt/storj-data/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/b4/pjdu3tyyrua2vjtrapj65t2z3r6pz76gvcjnci6n4wu3b7mmka.sj1 # ls -lah /mnt/storj-data/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/uz/7yjrs3jnnbok22n562o7shnnxwlfjof7iyvvgracztjonlg7tq.sj1 -rw-r--r-- 1 storj storj 16K Jul 19 01:35 /mnt/storj-data/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/uz/7yjrs3jnnbok22n562o7shnnxwlfjof7iyvvgracztjonlg7tq.sj1 # ls -lah /mnt/storj-data/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/eb/speu6rn5765okd3i2ebrw73frol3mekchtsqqtik2kv3yfscuq.sj1 -rw-r--r-- 1 storj storj 15K Jul 19 01:35 /mnt/storj-data/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/eb/speu6rn5765okd3i2ebrw73frol3mekchtsqqtik2kv3yfscuq.sj1 # ls -lah /mnt/storj-data/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/rt/puvf3nflxqu6e2wb4tmh5xyepaphkbpyqjbyqvf7yo2enr2ksa.sj1 -rw-r--r-- 1 storj storj 30K Jul 19 01:35 /mnt/storj-data/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/rt/puvf3nflxqu6e2wb4tmh5xyepaphkbpyqjbyqvf7yo2enr2ksa.sj1 ``` I would say this should be reworked to hold the piece all in memory until the upload is confirmed or cancelled, and only after this decision is made it should be written to the drive. There is this filestore.write-buffer-size configuration option (set to 4MiB on this particular node), which would suggest this would be the case, but this apparently isn't working correctly.

Yes of course, this may lead to a lot of garbage left on a drive. AFAIK currently the only way this gets cleared is through the Bloom filter which probably makes it slower and less efficient than planned.
Other issues are databases not updated and what was said before satellites not reporting consistent numbers. So what I am trying to say there are so many different issues that the real extent of the problem might be smaller than your numbers suggest. Or maybe it is even bigger. Without consistent and accurate numbers for me it hard to tell (yet).

edo · July 31, 2024, 9:11am

Thank you for your feedback! You’re right; there could be multiple issues contributing to the uncollected garbage. My main concern isn’t necessarily identifying the specific causes, but rather addressing the broader issue.

Instead of debating which issues are contributing to the uncollected garbage, let’s focus on two key questions in this specific topic:

Is there a new bug we need to report concerning data not being removed after TTL expiration?
And why does it seem like the Bloom Filters aren’t clearing up the uncollected garbage?

jammerdan · July 31, 2024, 9:26am

I can’t answer this. I don’t see an apparent new issue. But then suddenly something like the non-deletion of cancelled uploads surfaces. And a new update is not updating the trash.
Let’s say it is dynamic…

Have you checked if you suffer from slow deletes. From all over my nodes I would not say they do not work at all. Some days I see large amounts of trash increasing.
However I also see that sometimes deletion cannot keep up:

Where with every new Bloom filter the list of date folders gets longer and longer.
Also what we are seeing is probably the deletion of free tier data. Have you ever checked the size of the subfolders in your date folders? I mean how many files are in there. Because if the Bloom filters work this could be quite some files.

edo · July 31, 2024, 9:40am

I appreciate your feedback, thanks! Fortunately, I haven’t experienced any slow FileWalker processes, and I don’t have recurring issues with garbage collection (GC). The graph on the right I posted earlier shows all the trash folders on my nodes over time, stacked by expected release/removal date, specifically filtered for the SL satellite in this case. My observation is that while I am receiving and successfully processing Bloom Filters (BFs), they aren’t marking enough data as trash. I also wonder if this could be related to the maximum BF size, although, as shown in my graph, it was able to mark almost 1 TB of data as trash before.

jammerdan · July 31, 2024, 11:30am

Ok but for the Saltlake satellite you should not expect the Bloom filter to handle deletions. This is what the TTL is for (at least for the current test data). Also these TTL deleted pieces will not go into trash they will be / should be deleted off the disk without touching the trash folder.
Other than that of course there could be bugs but also maybe you do have less uncollected garbage than you think. However what we see like only 50% of used space is accounted for as monthly average that feels like way too much.

donald.m.motsinger · July 31, 2024, 11:34am

That’s the problem we’re discussing here from the beginning. TTL data doesn’t seem to get deleted when they expire. These pieces accumulate and the bloomfilter chokes on the amount of data and removes only a tiny fraction.

jammerdan · July 31, 2024, 4:20pm

It is just an idea but are you sure this could not be your issue:

github.com/storj/storj

Cancelled upload files aren't immediately deleted

opened 11:59PM - 18 Jul 24 UTC

zipiju

Bug

Noticed this on the forum, where an user was asked to create a bug report after …finding this bug, but I'm seeing none here, so creating one after was able to reproduce the issue. It looks like the node isn't actually deleting files from uploads that were cancelled at some point. This probably is a significant source of garbage and additional IOPS, especially in situations where the upload success rate tanks - either because the disk subsystem cannot keep up (as during the filewalks, and even with SSD cache in front of the platter drive), or when the circuit bandwidth is all used up. As per the logs, these example uploads were all cancelled, the files were however still created in the blobs folder: ``` # tail -f /var/log/storj/storj.log | grep cancel 2024-07-19T01:35:30+02:00 INFO piecestore upload canceled (race lost or node shutdown) {"Process": "storagenode", "Piece ID": "B4PJDU3TYYRUA2VJTRAPJ65T2Z3R6PZ76GVCJNCI6N4WU3B7MMKA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "109.61.92.82:39516"} 2024-07-19T01:35:30+02:00 INFO piecestore upload canceled (race lost or node shutdown) {"Process": "storagenode", "Piece ID": "UZ7YJRS3JNNBOK22N562O7SHNNXWLFJOF7IYVVGRACZTJONLG7TQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.213.34:38270"} 2024-07-19T01:35:30+02:00 INFO piecestore upload canceled (race lost or node shutdown) {"Process": "storagenode", "Piece ID": "EBSPEU6RN5765OKD3I2EBRW73FROL3MEKCHTSQQTIK2KV3YFSCUQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.213.33:33718"} 2024-07-19T01:35:31+02:00 INFO piecestore upload canceled (race lost or node shutdown) {"Process": "storagenode", "Piece ID": "RTPUVF3NFLXQU6E2WB4TMH5XYEPAPHKBPYQJBYQVF7YO2ENR2KSA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.205.228:50688"} # ls -lah /mnt/storj-data/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/b4/pjdu3tyyrua2vjtrapj65t2z3r6pz76gvcjnci6n4wu3b7mmka.sj1 -rw-r--r-- 1 storj storj 1.3K Jul 19 01:35 /mnt/storj-data/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/b4/pjdu3tyyrua2vjtrapj65t2z3r6pz76gvcjnci6n4wu3b7mmka.sj1 # ls -lah /mnt/storj-data/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/uz/7yjrs3jnnbok22n562o7shnnxwlfjof7iyvvgracztjonlg7tq.sj1 -rw-r--r-- 1 storj storj 16K Jul 19 01:35 /mnt/storj-data/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/uz/7yjrs3jnnbok22n562o7shnnxwlfjof7iyvvgracztjonlg7tq.sj1 # ls -lah /mnt/storj-data/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/eb/speu6rn5765okd3i2ebrw73frol3mekchtsqqtik2kv3yfscuq.sj1 -rw-r--r-- 1 storj storj 15K Jul 19 01:35 /mnt/storj-data/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/eb/speu6rn5765okd3i2ebrw73frol3mekchtsqqtik2kv3yfscuq.sj1 # ls -lah /mnt/storj-data/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/rt/puvf3nflxqu6e2wb4tmh5xyepaphkbpyqjbyqvf7yo2enr2ksa.sj1 -rw-r--r-- 1 storj storj 30K Jul 19 01:35 /mnt/storj-data/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/rt/puvf3nflxqu6e2wb4tmh5xyepaphkbpyqjbyqvf7yo2enr2ksa.sj1 ``` I would say this should be reworked to hold the piece all in memory until the upload is confirmed or cancelled, and only after this decision is made it should be written to the drive. There is this filestore.write-buffer-size configuration option (set to 4MiB on this particular node), which would suggest this would be the case, but this apparently isn't working correctly.

To me sound like it could explain what you are seeing. Unless you have a perfect record for uploads.
It would explain the discrepancy to the satellites used space. It would explain that the files do not get deleted by collector. It would explain the low numbers in your expired database. And if you have accumulated enough files, the Bloom filter would collapse and stop deleting files.

Roxor · July 31, 2024, 4:30pm

Would that mean that slower nodes that lose races (slower and losing for whatever reason) would constantly be accumulating data on-disk that they never get paid for?

I don’t mind losing. But it would suck to use the disk space anyways…

jammerdan · July 31, 2024, 4:44pm

This is how I understand that issue.

edo · July 31, 2024, 5:01pm

Could be, but to me it seems unlikely given the scale of the issue.

For this to be the cause, an enormous amount of TTL uploads would need to have been canceled but still written to disk. I know for a fact that my nodes were performing quite well during the massive ingress and couldn’t possibly have lost upload races in such proportions.

Also, if that were the case, my nodes couldn’t have been almost completely full, which they were four weeks ago (as reported by satellites).

donald.m.motsinger · July 31, 2024, 6:12pm

Same here. My nodes don’t run on potato hardware. I have 7 nodes on an LSI SAS2008 controller and used to have a success rate of ~99%. I can’t check the success rate anymore after I changed the log level for piecestore to FATAL.

Pentium100 · July 31, 2024, 7:51pm

Yeah, something is weird here:
According to this, the amount of data my node has from Saltlake is zero (the last value before zero was 3.28TB):

df agrees with the right graph:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        56T   45T   11T  81% /storj

piece_spaced_used.db says that saltlake has 35.2TB “content” and 135GB “trash”

The reported used space by the satellite definitely does not match the actual used space on disk.

During the last month, my node did not get a lot of ingress:

24.73mbps average * 30 days is about 8TB

So, according to the saltlake satellite, my node has 0B or 3.2TB of data.
According to df and piece_spaced_used.db my node has 35.2TB
Assuming most pieces uploaded to saltlake have 30 day TTL, then my node should have about 8TB (as that is how much was uploaded and everything uploaded before should be expired).

Three different ways to see how much data my node has and they all give very different results.

donald.m.motsinger · July 31, 2024, 8:06pm

No, as you can see here When will "Uncollected Garbage" be deleted? - #4 by donald.m.motsinger

I really had ~11TB data as reported by the satellites in the first half of July.

Saltlake didn’t report today.

Alexey · August 1, 2024, 6:34am

I do not have any updates. Except this one:

I would only suggest to make sure that the usage on the piechart of your storagenode dashboard is correlated to the actual usage on the disk.
The Average Used is not reliable to come to conclusion about what’s really accounted by the satellites due to lack of several reports from at least two satellites.
I do not have a solution or a workaround how to calculate the amount of uncollected garbage. You may calculate an average, but you need to exclude the missing reports (this is a tricky one - you obviously should exclude the zero ones, but also, which lower than a previous and the next report too - because it’s likely incomplete, so only fully reported should be included to calculate an average). Perhaps it would be simpler to take a last full report as an average. But I believe that the @BrightSilence’s script uses exactly that already.

I can only suggest to take a look on your previous period in the Payout Information, it contains real numbers reported from the satellites and paid (it replaces the estimation numbers when the satellites sends a payout receipts after the first two weeks every month).

edo · August 1, 2024, 7:19am

I’m really happy with the team’s feedback. Thank you for your input; it’s truly appreciated!

I understand that calculating exact numbers is challenging, and looking back, I shouldn’t have relied solely on calculations to prove my point and highlight a real issue. It also seems I’m not alone, as multiple SNOs are reporting the same problem.

Without diving into too many calculations, I’d like to emphasize a simple observation: My nodes were full four weeks ago, and the TTL has since expired. We don’t even need to consider the reported space by SL since we know the TTL expiration occurred after 30 days. Yet, the hard disks remain full, and the Bloom Filters (BFs) from SL don’t seem to be clearing it up. This is a straightforward issue — no complex analysis needed.

My two main questions remain:

@Alexey, is there anything we can check or provide to help answer these two questions?

P.S. The multiple BFs issue doesn’t affect me, as my retain folders are empty.

Alexey · August 1, 2024, 8:24am

I do not know how to confirm and what to check except what I already suggested - please, check that your disk is actually full by df --si.
I already shared this thread with the team, if there would be updates, I would try to post them or invite the team members.

edo · August 1, 2024, 8:33am

Thanks! I’ll wait for your updates.

df --si
Filesystem                         Size  Used Avail Use% Mounted on
/dev/sdb                           4.0T  3.8T  4.7G 100% /mnt/storj01
/dev/sdc                           4.0T  3.8T  5.3G 100% /mnt/storj02
/dev/sdd                           4.0T  3.8T   14G 100% /mnt/storj03
/dev/sde                           4.0T  3.8T  4.8G 100% /mnt/storj04