When will "Uncollected Garbage" be deleted?

edo · August 8, 2024, 11:52am

@littleskunk, Thank you for sharing this valuable information, I appreciate it.

Could you please elaborate on what you mean by that? Just to clarify, the graph I shared tracks the trash folders per satellite, per node, and by the release date (which is 7 days after creation/folder date). It specifically shows that I received and processed 4 BFs from SL in the last week, with an average of 60 GB marked as trash each time. The graph also indicates when a folder gets deleted by GC, as it disappears from the graph once it’s been cleared.

So, does this mean that the data will eventually be garbage collected over time, and there’s nothing we can do to speed up the process? I was really hoping to find a way to clear out the at least 4 TB of uncollected garbage more quickly so I can resume data accumulation. My nodes have been largely inactive for the past month.

edo · August 8, 2024, 12:03pm

Great! So you have now confirmed that there is no issue with TTL db inserts, at least on your side, right?

littleskunk · August 8, 2024, 12:07pm

Only for the past 24 hours. The day before I am still missing 10% of the inserts into the TTL db.

edo · August 8, 2024, 12:16pm

@donald.m.motsinger, I ran the same command for my node, to compare if it matches my previous count for files created on July 13:

find /mnt/storj03/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/ -type f -newermt "2024-07-13 02:00:00" ! -newermt "2024-07-14 01:59:59" | wc -l

It gives me back 480,292 while before I counted 760,60 files. The db counted 715,146. Which comes closer to my calculation.

I used the following command:

find /mnt/storj03/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa -maxdepth 2 -type f -daystart -mtime 28 | wc -l

In other words: count the files that were created/modified exactly 28 days ago (2024-07-11) looking at date and omitting time. It might not be perfect, but close enough for me. Are you be willing to run this command on your side and see if it makes any difference?

donald.m.motsinger · August 8, 2024, 12:30pm

These 2 commands count 2 different days.

edo · August 8, 2024, 12:32pm

Yes, that is right. I wanted to count July 13, while you are interested in July 11, right?
So the first command is your command I changed to run on my side.
The second command is my command that I changed to run on your side.
Apologies, It’s a bit confusing

donald.m.motsinger · August 8, 2024, 12:35pm

Not really. I don’t know what to do with your numbers. Every node has different uploads.

edo · August 8, 2024, 12:40pm

It seems there might have been a bit of a mix-up. You don’t have to do anything with my numbers. I’m just trying to help clarify why your numbers are off by sharing what I did (and which commands I used). But of course, feel free to go with whatever approach works best for you!

Mitsos · August 8, 2024, 3:39pm

How would the bloom filter respond to the following scenario?

You are currently storing 20m pieces according to the satellite. 30 days later, 15m pieces are expired but for whatever reason they aren’t deleted from the disk. The satellite now knows that your node stores 5m pieces and generates a bloom filter based on that. What happens to the other 15m pieces? (shouldn’t be 10% since the satellite isn’t generating a bloom filter based on the 20m pieces that it originally was tracking)

pasatmalo · August 8, 2024, 4:00pm

It would still get rid of 90% of the deleted files.

For Storj, the generation of the bloom filter will take the 5M remaning files currently tracked by the satellite, and generate a bloom filter based on those files, with a target of a 10% False Positive rate (this is why the bloom filter size varies depending on the number of blobs).

Once your node receives this bloom filter, it does not matter if there are 15M or 1M trash files, the bloom filter will always (as long as max bloom filter size isnt reached) have a 10% False Positive rate, therefore removing 90% of the garbage. The deleted 15M files are never involved in the generation of the bloom filter in any way, therefore the number of deleted files has no impact.

Mitsos · August 8, 2024, 4:10pm

That doesn’t seem to be the case, otherwise this topic wouldn’t exist. It seems to me that we need a fixed size for a bloom filter that can support the max number of pieces a node should have.

Balage76 · August 9, 2024, 7:28am

Unfortunatelly this is not happening the described way.
This is what I tried to mention a bit earlier:
The TTL collector do not delete some files after 15/30 days and also the bloom filter leave them behind if these are older than 45 days.
These files are simply forgotten there somehow.

pasatmalo · August 9, 2024, 8:07am

Yes I understand that something does not seem to be working right, given that I also seem to experience similar problems with my nodes (as US1 is not reporting I cant really tell how much uncollected garbage really exists to be honest), but if TTL data is not being collected by GC (when collector missed it), the root cause would be something other than the bloom filter itself.

As described, bloom filters will have a fixed False Positive rate (once generated). Bloom filters are a well understood data structure, meaning that their behaviour is predictable. Thats what I was talking about in my previous case. If the process works correctly, the False Positive rate will only depend on existing data (Bloom filter - Wikipedia).

Assuming (because I havent looking into it) that it is the case that GC is leaving these files behind at a rate over 10% (again, assuming max bloom filter size isnt reached), the root cause would be elsewhere in the process. It could be that the satellites are generating the bloom filters incorrectly. It could be that the nodes are having some issue using the bloom filter for some data.

If you ask me, the most likely probable cause of uncollected garbage at the moment is probably an issue with the collection of TTL data, and GC is taking care of it slowly, but takes a while due to the nature of the process. This combined with the fact disk usage databases are having some issues tracking changes in used space, it would make it seem like garbage piles on endlessly. (Again, not sure, just my opinion, could very well be totally wrong).

As I dont really know the cause behind uncollected garbage I rather not speculate too much, but Im happy to resolve doubts regarding how bloom filters in general work.

mikee027 · August 9, 2024, 9:48am

how can i find out the filter size?

donald.m.motsinger · August 9, 2024, 9:57am

grep “Filter Size” /mnt/storagenode/node/node.log

littleskunk · August 9, 2024, 11:31am

I have a script that can line up the pieceIDs from storage node logs and TTL db. It isn’t the most elegant but it will do the job. Now we can go on a hunt.

# Selecting uploads from today only. My timezone is UTC+2 so I need to select 22:00:00 the previous day to line it up with 00:00:00 in the storage node logs. I could take a longer history but lets start simple with not so many entries.
sqlite3 /home/storagenode/ssd/sn8/dbs/piece_expiration.db "SELECT piece_expiration, hex(piece_id) FROM piece_expirations WHERE hex(satellite_id)='7B2DE9D72C2E935F1918C058CAAF8ED00F0581639008707317FF1BD000000000' AND piece_expiration>'2024-09-07 22:00:00' ORDER BY piece_expiration;" > ttl.txt

# checking only 12:50 - 12:59 here. Can be extended later
for piece in `grep uploaded /home/storagenode/logs/storagenode8.log | grep 2024-08-09T12:5 | grep "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE" | grep '"PUT"' | cut -d '"' -f 8`
do
	hex=`echo $piece==== | base32 -d | xxd -ps | tr -d '\n'`
	ttl=`grep -i $hex ttl.txt`
	if (($?==0 )) then
		echo found
	else
		echo missing
	fi
done

littleskunk · August 9, 2024, 4:19pm

Turns out I am kind of lucky that I was able to observe it on my storage nodes. For the last 2 days there was no entry missing in the TTL DB beside a few hundert caused by a storage node restart. I had to go back 3 days or so to find a good time period.

github.com/storj/storj

Missing inserts into TTL DB

opened 04:11PM - 09 Aug 24 UTC

littleskunk

Bug

I have writen a script to compare my storage node log entries with my TTL db and… it turns out I am missing some inserts. ``` sqlite3 /home/storagenode/ssd/sn8/dbs/piece_expiration.db "SELECT piece_expiration, hex(piece_id) FROM piece_expirations WHERE hex(satellite_id)='7B2DE9D72C2E935F1918C058CAAF8ED00F0581639008707317FF1BD000000000' AND piece_expiration>='2024-09-05 21:00:00' AND piece_expiration<='2024-09-06 23:00:00' ORDER BY piece_expiration;" > ttl.txt for piece in `grep uploaded /home/storagenode/logs/storagenode8.log | grep 2024-08-07 | grep "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE" | grep '"PUT"' | cut -d '"' -f 8` do hex=`echo $piece==== | base32 -d | xxd -ps | tr -d '\n'` ttl=`grep -i $hex ttl.txt` if (($?==0 )) then echo found $piece else echo missing $piece fi done #total sort compare.txt | cut -d ' ' -f 1 | uniq -c 134809 found 18885 missing #ordered by time cat compare.txt | cut -d ' ' -f 1 | uniq -c 577 found 1 missing 1 found 592 missing 2027 found 1 missing 1598 found 613 missing 7786 found 682 missing 2826 found 682 missing 1427 found 738 missing 3565 found 686 missing 3582 found 726 missing 2171 found 729 missing 698 found 717 missing 1468 found 762 missing 23903 found 778 missing 8728 found 639 missing 4339 found 574 missing 1839 found 576 missing 1204 found 673 missing 649 found 1359 missing 3722 found 691 missing 684 found 672 missing 4655 found 1303 missing 14641 found 957 missing 461 found 444 missing 5227 found 283 missing 17 found 126 missing 281 found 367 missing 6305 found 328 missing 3562 found 382 missing 4059 found 295 missing 632 found 326 missing 3472 found 392 missing 1492 found 386 missing 3490 found 405 missing 13721 found ``` I don't see any pattern why this is happening. It could be that what ever is in the cache doesn't get persistet into the TTL DB for some reason. I see no oustanding error message that would explain it. Also in the last 2 days I found no missing pieces while still seeing the usual error messages like upload failed or upload canceld. I am running out of ideas why these inserts don't make it into the TTL DB. I am going to attach my log file and the result file of my comparison. That might help to find the root cause.

edo · August 9, 2024, 4:28pm

Great job, @littleskunk! It sounds like you’ve pinpointed a potential issue with the TTL inserts and even created a GitHub issue for it — much appreciated.

The script you provided looks fantastic; I’ll give it a try on my end as well when I have the time. Thanks for diving into this!

edo · August 9, 2024, 8:30pm

@littleskunk I ran your script on my node with the following results. Although I did not have a lot of TTL ingress, it may be useful information:

upload date	found	missing
2024-08-06	3981	1125
2024-08-05	0		52
2024-08-04	1504	978
2024-08-03	266		1
2024-08-02	11036	1348
2024-08-01	8754	15
2024-07-31	0		65
2024-07-30	2208	650
2024-07-29	3891	7
2024-07-28	0		0
2024-07-27	16168	1157
2024-07-26	5738	38

Toyoo · August 10, 2024, 12:47am

By the calculator: satellite expects 5M pieces, so it will generate a filter with 24M bits (3 MB), which is well within the current maximum size of a bloom filter. Then, for each unexpected piece tested there is 10% chance that it will still survive, so on average you expect 1.5M pieces to survive despite not being expected. The subsequent bloom will reduce this number down to 150k pieces.

Consider a simple exercise: you have a dictionary of random English words, and you want to construct a “bloom” filter that only selects the following words: abstract, aghast, brass, chart, grammar, mascara, scarf, straw, swatch, tasty, tray, warm, wrath, yacht. So you make the following filter: keep only words that do not have any of the letters eiklnpou. Each of the letter is a yes/no question to test for, and you only keep a word if the word doesn’t fail any test.

The larger dictionary you started from, the more words you remove, because any random word from outside of the list of words to keep is likely to fail at least one test of the questions: contain at least one of the letter from eiklnpou. You will obviously sometimes find an unexpected word that by chance also does not contain any of those, but on average the chance is the same for each word, whether you start with a longer or shorter list.

My local /usr/share/dict/american-english has 104334 entries, and only 1727 would pass all these tests:

% wc -l /usr/share/dict/american-english
104334
% </usr/share/dict/american-english grep -v '[eiklnpou]' | wc -l
1727

If I had a smaller dictionary with, let say 10433 random words (10%), then I’m getting proportionally less words matching the tests as well:

% shuf /usr/share/dict/american-english | head -10433 | grep -v '[eiklnpou]' | wc -l
179

(I’m getting values from 160 to 196)

So I would also fully expect that a bigger dictionary would also have a proportionally more words matching.

Now, whatever observation strays away from this theoretical exposition would mean Storj’s implementation is at fault. For example, not having the right questions to ask. If you want to keep a list of one thousand most often used words in English, a naïve letter-based filter is not enough, because you cannot construct a yes/no question that would be failed by other words: the words you want to keep use all 26 letters commonly used in English. You need a better, bigger set of characteristics to filter by (maybe syllables, or trigrams).

Bloom filters are just a way of constructing those questions with good mathematical properties (like, being very particular about which pieces it chooses) while keeping the filter small. But the principle is the same: satellite sends a list of yes/no questions, any piece that fails at least one of the questions is to be removed. The bigger the filter, the better questions are available.