When will "Uncollected Garbage" be deleted?

Bloom filter for me took about 14hours to remove 5.6TB to trash - on ZFS so not sure if ArcCache had a benefit.

Thanks
CC

2 Likes

20 hours on ZFS for a full 16tb node (l2arc metadata only)
About 50% of entire process after 48hours on simply ext4 nodes.

1 Like

I’m very glad we resolve the issue “Uncollected garbage” recently and now we have great news, but I can’t shake off feeling the timing is a bit convenient. Call me blunt but what if that bug was on purpose to increase total network capacity?

Specifically, on uncollected garbage topic, this comment is useful When will "Uncollected Garbage" be deleted? - #283 by elek

But it only address client side GC collection (- please correct me if I’m wrong). The only thing I found was this:

Now it look like BF was working as expected, so…what was the bug? How was the BF generation bug on server side was found and addressed? I believe this is important to clarify to further enhance trust between StorJ and SNOs.

There is no room for conspiracy theories. If Storj wanted to choke nodes with data they could have uploaded it with infinite TTL.

2 Likes

Before BF not targeted files with TTL at all, because it intended to be deleted by GC with list of expired files in DB, for some reason some files was not listed in DB, so nothing was deleted them.- this is the bug. now when it was discovered, storj made additional function to BF delete this files Also.

1 Like

I think it was shared above, including the exact patches.

TLDR: we never have such a large amount of TTL in the database earlier… Each component worked well as before, but with a large backlog of expired pieces, it didn’t work any more.

It’s a combination of various circumstances.

If sg. is a bug, it’s more like the SN side, which didn’t really delete expired pieces (without GC cleanup). That will be addressed by an other patch https://review.dev.storj.io/c/storj/storj/+/13789?tab=comments

The root cause is similar. Load is increasing…

3 Likes

Wait, am I supposed to remove all this trash that showed up manually?

Who said this? Can you quote the message which supposedly said it?

1 Like

It gets deleted after it’s 7 days old. the trash cleanup checks every once in a while.

I have seen older trash folders left dangling, usually if the node is backed up or has jobs failing. Not the log expert but you can search logs for “pieces:trash” to see the last time it ran.

1 Like

Please, do not do this. The pieces can be audited from the trash, and if it would be missing, your node would start to fail audits and can be disqualified.
The data would be removed automatically after 7 days.

I have many GC still working from 25 august in the 25-08 folder. What happen after 7 days? GC still moving there files when deleting phase is going to start.

They would work in parallel. If the GC wouldn’t finish when the trash-filewalker removes pieces, it would be captured in the next run (it’s running every hour by default).

However, not necessarily in the time order, see

But the amount of the deleted data will be the same. I do not understand the author’s concerns, because it’s related to the speed of deletion, and the sort order doesn’t affect the amount of the deleted data. The order also doesn’t makes any sense, because the trash data will be deleted sooner or later, if it’s expired.

OK. I had no intention of doing it myself, hence the question. The question was based on some of what I’ve read in various threads.

Currently, I have over a TB of trash that has appeared and it just seems to be sitting there not being deleted.

you know, this thread did inspire me to look through the data folder on my second oldest node, and I found:

  • some items in a “garbage” folder, which is obsolete as of several months ago
  • lots of directories, and a few files, in blobs and trash for two old retired tardigrade nodes that were retired also several months ago.

So uh, here’s to a few gig of freed up storage I guess!

It would be deleted after 7 days in the trash. However, the deletion itself could be slow on your node. But you may check, which folders do you have:

tree -L 2 /mnt/x/storagenode2/storage/trash/

So a little over 7 days ago, I saw the garbage collection complete, and so now 7 days later was expecting the 5.6TB of trash to be cleared.

Sadly nothing has happened today. Is it seven days in trash? Can it be more?

Thanks
CC

The 2024-08-25 folders will be deleted tomorrow

2 Likes

Yes.

Trash removal is a process that deletes data incrementally piece by piece, not some magic that clears the space instantaneously. With a large amount of data like 5.6TB, it will take some time to complete. Additionally, if you have multiple date folders, the order in which they are processed is unpredictable, which may affect the when this data will start to get deleted.

I have always observed the oldest folder being deleted first. Have you seen it otherwise ?

Interesting. What OS are you on?

Yes.
This is/will be the processing order of the date folders on one of my nodes:

Date         Date folder

Aug 30 04:23   2024-08-29
Aug 29 01:12   2024-08-28
Jul 7  03:11   2024-07-06
Aug 4  02:08   2024-08-03
Aug 17 01:25   2024-08-16
Aug 22 02:52   2024-08-21
Jul 29 23:06   2024-07-29
Aug 5  03:12   2024-08-04
Aug 28 02:20   2024-08-27
Aug 26 02:59   2024-08-25
Aug 10 02:14   2024-08-09
Aug 21 02:54   2024-08-20
Jul 14 02:43   2024-07-13
Aug 6  02:50   2024-08-05
Aug 25 03:11   2024-08-24
Jul 23 13:22   2024-07-23
Jul 27 02:53   2024-07-26
Sep 1  11:24   2024-09-01
Aug 11 01:24   2024-08-10
Jul 6  03:15   2024-07-05
Jul 23 03:03   2024-07-22
Aug 24 02:55   2024-08-23
Aug 19 00:02   2024-08-18
Sep 1  02:54   2024-08-31
Aug 7  03:24   2024-08-06
Aug 31 03:00   2024-08-30
Aug 9  03:13   2024-08-08
Jul 8  02:43   2024-07-07
Aug 26 23:33   2024-08-26
Aug 15 02:14   2024-08-14
Aug 8  02:34   2024-08-07
Jul 16 02:20   2024-07-15
Jul 13 01:27   2024-07-12
Jul 22 02:22   2024-07-21
Aug 17 23:49   2024-08-17
Aug 23 01:06   2024-08-22