When will "Uncollected Garbage" be deleted?

well i observe that since the BF become bigger the disk never rest, (never less than 100% “active time” by windows measures under Alt Ctrl del). before i was able to spot when BF finished. I suspect BF got so big that it become a problem for node to finish it, just like full used-space-filter.

Does BF execution by node has rensume mechanism if node got stoped or restarted?

Most likely the issue is not the increase of size of the bloom filter, rather that lately bloom filters have been sent more often, about every 1-2 days (I see 9 trash folders for September in one of the satellites). Before bloom filters used to be sent out weekly.

An increase of the bloom filter size should not really impact the disk activity at all (ignoring more deletions thanks to a lower false positive rate). On the other hand, it might increase CPU and memory usage.

Yes, garbage collection should pick up close to where it stopped if the node is restarted (storagenode/{pieces,blobstore}: save-state-resume feature for GC file… · storj/storj@0f90f06 · GitHub).

1 Like

The bigger bloom filters do take longer to run though. Like hours and hours and hours for one satellite if you’re talking about multiple terabytes of data stored for it. So if they get stacked up one at a time your node may be running bloom filters one after another continuously.

I think (?) the garbage collection time scales up with the amount you have stored. In other words, more storage → bigger bloom filter → slower garbage collection. And of course garbage collection time decreases with faster storage (caching, metadata on SSD, etc)

1 Like

That is correct, but the main bottleneck when running garbage collection is the hard drive. For GC the node must check every single blob (of a given satellite) against the bloom filter. Generally speaking, fetching blobs from the hard drive is slow and checking against the bloom filter is fast.

As you said yourself, faster storage speeds up the GC time, because it speeds up fetching the blobs, but has no effect on checking against the bloom filter.

This is what I meant with saying that higher bloom filter sizes is most likely not the cause of GC taking considerably longer. In my case, there is usually plenty of free CPU time to be used, therefore fetching the pieces is the biggest load.

If you have a somewhat underpowered system, it might struggle more processing the blobs against the bloom filter, and therefore having a larger bloom filter might have a higher impact.

This also made me curious about if GC currently pre fetches more blobs while checking others, as this could help parallelize the process and in turn speed it up.

hey - i have over 35TB of trash combined on all nodes and have waited for weeks.

File walkers finsh fine - badger cache enabled - plenty of ram and not many restarts.

yet garbage cleaning is not really happenining.

Should i (this time only)
Manually go in and delete the trash?
im on windows and also not sure what the best way to manual delete it is.
thx.

2 Likes

you can look at your logs for “empty” to see if emptying trash jobs are running or finished.

You can look in your trash folders and if you have days that are older than 7 days then you are 99% safe to delete them.

on linux, it’s faster to run a manual rm command, or even piping the find command into rm (to avoid sorting penalties)

On windows… probably just deleting the directory with explorer or with commandline del command. It should be faster than letting the node do it… but dunno if it’s a tiny bit or a lot.

If you delete trashes manually you should probably restart your node, which will trigger the used space filewalkers again.

thanks - i will give it a go.
just mostly looking for the fastest method of deleting on windows.

find “empty” in log, finds me emptying trash records

Last time all satellites trash was emptied successfully was on 3rd of 09.2024
Last trash emptying record occurs in 12th of 09.2024

12-17.09.2024 no records at all, no attempts to empty trash?

Oh i see some today 17.09.2024 there was some successful trash emptying and took above 2h.

i see more records for this same satellite successful, but its only the same one all the time.
i see failed records for other satellite, but also not all satellites was attempted i see.

the reason for failed is only that it “exited with 1” and some long, 1 line code gibberish.
cant copy that from node here right now, looks standard (not including any obvious insights about more info, just that, its “status 1” another line below it says)

So no full successful trash emptying from 03.09.2024, and how often it should occure?
its Windows GUI , its 1.112.2 i see it updates to this version on 17.09.2024
that would explain why a trash emptying occurred, after many days absent,
and 1 satellite was even successful, but what about others, i see there was no attempt to start emptying others. Wait, I see attempt to 1 more satellite, but failed multiple times in different time on that 17.09.2024.

And disk is at 100% all the time, probably busy with BF.
Dashboard says like 13TB trash (out of 14TB) interesting, will see if after 7 days it will go emptying, wonder if. Remind me if i don’t post. (note for myself: its 2nd_3)

Yes. It has BF in the /retain folder and keep its track in the databases, so it would resume processing on the next run. It also removes not yet processed BFs, if a newest BF has arrived. So basically it may have up to two BFs per satellite - one in the process, the second on hold.
By default the node process them with a concurrency 1 (it was 5 several versions ago).

This could be a little bit improved with enabling of the badger cache, see

more like it’s ongoing.
Please filter it like this:

grep "\strash" /mnt/x/storagenode2/storagenode.log | grep -E "started|finished"

for Windows PowerShell:

sls "\strash" "C:\Program Fles\Storj\Storage Node\storagenode.log | sls "started|finished"

does it include the phrase “context canceled”? That can be a sign of an overloaded hard disk and the trashing just fails. But it would probably try again in a few hours or the next day.

if the folder with the old trash starts with 1wFT that’s the saltlake satellite which has had all of the test data (and all of the trash)

It’s running every hour by default, but only one by one. So, the actual run time could not match this schedule, if the deletion tooks more than a hour.

Update, looks like there’s some progress.

1 Like