When will "Uncollected Garbage" be deleted?

Ruskiem · September 17, 2024, 11:38am

well i observe that since the BF become bigger the disk never rest, (never less than 100% “active time” by windows measures under Alt Ctrl del). before i was able to spot when BF finished. I suspect BF got so big that it become a problem for node to finish it, just like full used-space-filter.

Does BF execution by node has rensume mechanism if node got stoped or restarted?

pasatmalo · September 17, 2024, 12:23pm

Most likely the issue is not the increase of size of the bloom filter, rather that lately bloom filters have been sent more often, about every 1-2 days (I see 9 trash folders for September in one of the satellites). Before bloom filters used to be sent out weekly.

An increase of the bloom filter size should not really impact the disk activity at all (ignoring more deletions thanks to a lower false positive rate). On the other hand, it might increase CPU and memory usage.

Yes, garbage collection should pick up close to where it stopped if the node is restarted (storagenode/{pieces,blobstore}: save-state-resume feature for GC file… · storj/storj@0f90f06 · GitHub).

EasyRhino · September 17, 2024, 5:29pm

The bigger bloom filters do take longer to run though. Like hours and hours and hours for one satellite if you’re talking about multiple terabytes of data stored for it. So if they get stacked up one at a time your node may be running bloom filters one after another continuously.

I think (?) the garbage collection time scales up with the amount you have stored. In other words, more storage → bigger bloom filter → slower garbage collection. And of course garbage collection time decreases with faster storage (caching, metadata on SSD, etc)

pasatmalo · September 17, 2024, 5:50pm

That is correct, but the main bottleneck when running garbage collection is the hard drive. For GC the node must check every single blob (of a given satellite) against the bloom filter. Generally speaking, fetching blobs from the hard drive is slow and checking against the bloom filter is fast.

As you said yourself, faster storage speeds up the GC time, because it speeds up fetching the blobs, but has no effect on checking against the bloom filter.

This is what I meant with saying that higher bloom filter sizes is most likely not the cause of GC taking considerably longer. In my case, there is usually plenty of free CPU time to be used, therefore fetching the pieces is the biggest load.

If you have a somewhat underpowered system, it might struggle more processing the blobs against the bloom filter, and therefore having a larger bloom filter might have a higher impact.

This also made me curious about if GC currently pre fetches more blobs while checking others, as this could help parallelize the process and in turn speed it up.

HGPlays · September 17, 2024, 7:27pm

hey - i have over 35TB of trash combined on all nodes and have waited for weeks.

File walkers finsh fine - badger cache enabled - plenty of ram and not many restarts.

yet garbage cleaning is not really happenining.

Should i (this time only)
Manually go in and delete the trash?
im on windows and also not sure what the best way to manual delete it is.
thx.

EasyRhino · September 17, 2024, 8:19pm

you can look at your logs for “empty” to see if emptying trash jobs are running or finished.

You can look in your trash folders and if you have days that are older than 7 days then you are 99% safe to delete them.

on linux, it’s faster to run a manual rm command, or even piping the find command into rm (to avoid sorting penalties)

On windows… probably just deleting the directory with explorer or with commandline del command. It should be faster than letting the node do it… but dunno if it’s a tiny bit or a lot.

If you delete trashes manually you should probably restart your node, which will trigger the used space filewalkers again.

HGPlays · September 17, 2024, 8:26pm

thanks - i will give it a go.
just mostly looking for the fastest method of deleting on windows.

Ruskiem · September 18, 2024, 1:24am

find “empty” in log, finds me emptying trash records

Last time all satellites trash was emptied successfully was on 3rd of 09.2024
Last trash emptying record occurs in 12th of 09.2024

12-17.09.2024 no records at all, no attempts to empty trash?

Oh i see some today 17.09.2024 there was some successful trash emptying and took above 2h.

i see more records for this same satellite successful, but its only the same one all the time.
i see failed records for other satellite, but also not all satellites was attempted i see.

the reason for failed is only that it “exited with 1” and some long, 1 line code gibberish.
cant copy that from node here right now, looks standard (not including any obvious insights about more info, just that, its “status 1” another line below it says)

So no full successful trash emptying from 03.09.2024, and how often it should occure?
its Windows GUI , its 1.112.2 i see it updates to this version on 17.09.2024
that would explain why a trash emptying occurred, after many days absent,
and 1 satellite was even successful, but what about others, i see there was no attempt to start emptying others. Wait, I see attempt to 1 more satellite, but failed multiple times in different time on that 17.09.2024.

And disk is at 100% all the time, probably busy with BF.
Dashboard says like 13TB trash (out of 14TB) interesting, will see if after 7 days it will go emptying, wonder if. Remind me if i don’t post. (note for myself: its 2nd_3)

Alexey · September 18, 2024, 5:23am

Yes. It has BF in the /retain folder and keep its track in the databases, so it would resume processing on the next run. It also removes not yet processed BFs, if a newest BF has arrived. So basically it may have up to two BFs per satellite - one in the process, the second on hold.
By default the node process them with a concurrency 1 (it was 5 several versions ago).

Alexey · September 18, 2024, 5:27am

This could be a little bit improved with enabling of the badger cache, see

Alexey · September 18, 2024, 5:33am

more like it’s ongoing.
Please filter it like this:

grep "\strash" /mnt/x/storagenode2/storagenode.log | grep -E "started|finished"

for Windows PowerShell:

sls "\strash" "C:\Program Fles\Storj\Storage Node\storagenode.log | sls "started|finished"

EasyRhino · September 18, 2024, 6:36am

does it include the phrase “context canceled”? That can be a sign of an overloaded hard disk and the trashing just fails. But it would probably try again in a few hours or the next day.

if the folder with the old trash starts with 1wFT that’s the saltlake satellite which has had all of the test data (and all of the trash)

Alexey · September 18, 2024, 6:48am

It’s running every hour by default, but only one by one. So, the actual run time could not match this schedule, if the deletion tooks more than a hour.

Rassah · September 18, 2024, 10:58am

Update, looks like there’s some progress.

Aitor · September 23, 2024, 8:09pm

My trash is still not deleted even though the days are passing and it’s been more than 20 now. Should I wait more? It is too much time.

EasyRhino · September 23, 2024, 9:39pm

from your screenshots it looks like 700GB has been deleted in 10 days. which seems very slow.

for whichever node has old trash in it, can you grep your log for ‘empty’?

if the latest entry saying “empting trash” then it is nominally still running the deletes, just super slowly. You may want to consider manually deleting to speed it along (and see if there’s any slow storage issues on your system.

You may also want to turn OFF the lazy filewalkers, I think it can affect trash too and make it slower.

If the last entry in your log says that either emptying trash is completed or errored, then your node is not trying to delete anything any more. Which wouldn’t happen, but a node restart would probably get it re-running the delete process (and used space filewalkers too)

mike · September 24, 2024, 6:30am

If you are looking for a tool to watch live IO, fatrace is your friend.

lsof can be somewhat hard to match wanted activity and it’s not a live streaming input/output of activity.

Ie to look for all deletions going on:

fatrace -t | grep “ D “

Alexey · September 24, 2024, 7:12am

Could you please show the output of the command for your path:

sudo tree -L 2 /mnt/x/storagenode2/storage/trash/

If this is Windows, then PowerShell:

Get-ChildItem -Depth 1 -Directory X:\storagenode2\storage\trash\

Aitor · September 24, 2024, 8:56am

/mnt/easy1/storage/trash/
├── pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa
│ ├── 2024-09-07
│ ├── 2024-09-15
│ ├── 2024-09-17
│ ├── 2024-09-19
│ └── 2024-09-22
├── qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa
│ ├── 2024-09-13
│ ├── 2024-09-16
│ ├── 2024-09-20
│ └── 2024-09-23
├── ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa
│ ├── 2024-09-06
│ ├── 2024-09-13
│ ├── 2024-09-14
│ ├── 2024-09-16
│ ├── 2024-09-19
│ ├── 2024-09-21
│ └── 2024-09-23
└── v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa
├── 2024-09-13
├── 2024-09-14
├── 2024-09-18
├── 2024-09-21
└── 2024-09-23

25 directories, 0 files

This node has 4.42TB in Trash. Should I reboot it?

Vadim · September 24, 2024, 11:08am

Can you describe your setup is it HDD or some pool of HDD, what HDDs are used(exact model)