When will "Uncollected Garbage" be deleted?

edo · August 7, 2024, 8:20pm

Thank you for your thoughts. Unfortunately the data expired and somehow turned into uncollected garbage. That is what this topic is about, isn’t it?

littleskunk · August 7, 2024, 8:25pm

It is just a question of bloom filter size. At the beginning you still had millions of pieces and that means the bloom filter needs to be big. Now that these pieces are gone from the satellite point of view it can send you a smaller bloom filter. That should do the trick.

Vadim · August 7, 2024, 9:36pm

interesting fact that i searched in the logs.
when you restart node or in process of work, is see some collector errors, that cant delet files for different satellites, exept
unable to delete piece {“Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”
I searched several node logs, around gigabyte long some are from Aprille even.
it is to good to be tru thet there is not happen this error for this sattelite.
most i see them for 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S

is it possible that this error not logged, or collector never gets to this one?
because i just found 650+ GB of trash from SL deleted in begining of this month.

AussieNick · August 8, 2024, 12:40am

I’ve noticed on a brand new node, it’s only 9 days old, I’m getting these errors.

2024-08-07T14:44:38Z    WARN    collector       unable to delete piece  {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "JKPT7I57HVHRPYAQKBQ7GYC2GAJIQER6V2JDWCQKRQTJTQJGHHJA", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat:124\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:290\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).DeleteWithStorageFormat:270\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteSkipV0:356\n\tstorj.io/storj/storagenode/collector.(*Service).Collect.func1:88\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).GetExpired:71\n\tstorj.io/storj/storagenode/pieces.(*Store).GetExpired:579\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:87\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:65\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:61\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-08-07T14:44:38Z    WARN    collector       unable to delete piece  {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "CAIRW5CAGJQPKU6MU5NJCSJO4MONXBJFECRHADDQ4CT7WEUNNSAQ", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat:124\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:290\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).DeleteWithStorageFormat:270\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteSkipV0:356\n\tstorj.io/storj/storagenode/collector.(*Service).Collect.func1:88\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).GetExpired:71\n\tstorj.io/storj/storagenode/pieces.(*Store).GetExpired:579\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:87\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:65\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:61\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}

I did find a gc run for that satellite the previous day.

2024-08-06T14:06:18Z    INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "createdBefore": "2024-07-31T17:59:59Z", "bloomFilterSize": 407603, "Process": "storagenode"}
2024-08-06T14:09:10Z    INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "piecesSkippedCount": 0, "Process": "storagenode", "piecesCount": 2180102, "trashPiecesCount": 27085, "piecesTrashed": 27085}

This seems that the bloom filter is deleting TTL data before it is expiring. For some reason the trash size on the dashboard didn’t update, and it still shows 0. It currently has 6.3GB in the trash by my count using du -sh.

Do you see a log line before those errors for the gc?

AussieNick · August 8, 2024, 12:52am

So your saying that we simply need a larger bloom filter from the satellite? Is that in the works? will that happen?

Alexey · August 8, 2024, 2:20am

This is only mean that the Garbage Collector has moved them to the trash already. This may happen, if the client deleted the TTL data before it expired, and there is an easy check:

this is a bug in the current version. You may restart the node to allow the used-space-filewalker update the trash usage.

You need to use du -s --si instead, because the dashboard showing numbers in SI (base 10).

It’s happening already. Search for retain and Prep|Move.

Vadim · August 8, 2024, 3:35am

Filewalker also do not update trash usage

edo · August 8, 2024, 7:29am

@littleskunk, thanks for the explanation! So, if I understand correctly, as the amount of data reported by SL decreases, the bloom filters should also shrink, leading to more efficient cleanup.

However, based on my current observations (see graph below), I’m still seeing only small amounts of data being cleared with each bloom filter and is not growing — around 60 GB at a time on average — even though I estimate there’s at least 4 TB that needs to be cleaned up. At this rate, it feels like the process will take a very long time to complete.

Is there anything specific I can do to help debug or optimize the BF/GC process? Any further insights would be greatly appreciated!

Balage76 · August 8, 2024, 7:38am

Is there any kind of parameter when creating BF that determines which files with the creation date to consider?
It seems to me that for SLC the BF do not delete any file which is created 45 days earlier than the “Created Before” date visible in the BF’s log line.
This is a repeating pattern I see in my trash folders for SLC.

For the other satellites, it is not like this, I see files created much earlier.

Alexey · August 8, 2024, 8:41am

Did it finish without issues for all trusted satellites?

Alexey · August 8, 2024, 8:43am

The creation time there doesn’t matter. The BF is constructed to instruct the node, which pieces must be saved, not which should be deleted.

littleskunk · August 8, 2024, 10:11am

It is for a different satellite ID. I wouldn’t worry to much about that right now. I do see similar messages in my logs but haven’t tracked it down. I do know there is a bug with TTL deletes not updating the TTL db and after a restart basically trying to delete the same pieces a second time. Not great but deleting the same pieces twice shouldn’t leave any garbage behind. It is worth checking it later on when the garbage is in check and we might want to minimize the IO usage. In that regard deleting the same pieces twice isn’t ideal.

Just as a quick check could you run this query to rule out a timezone issue?
SELECT piece_expiration FROM piece_expirations ORDER BY piece_expiration LIMIT 10;

littleskunk · August 8, 2024, 10:20am

Correct. It isn’t a long term solution. The moment the simulated load pattern hits US1 with real customer data we will get a problem. So we still need to find the root cause otherwise it will happen again.

Is your dashboard tracking any garbage collection? A US1 bloom filter will not be applied on SLC data. So there will be like 3 garbage collection runs moving almost no data and finally the SLC bloom filter arrives.

For now I would assume SLC garbage collection works within expectation. The number of pieces we uploaded to SLC had to break the maximum bloom filter size. I put up that expectation before we uploaded the first file. It is just a numbers game. Bigger nodes will hit almost 100 Million pieces and the bloom filters arn’t big enough for that. At least for now. In the long run we do need to scale the bloom filter size or find a different solution. In the short run it should work on SLC with no bloom filters at all. The bloom filters are the backup plan in case TTL db gets compromised and needs to be rebuild or something like that. If we need bloom filters on all nodes then the TTL db has somehow failed. Thats what I try to find out first. Whats wrong with the TTL db.

littleskunk · August 8, 2024, 10:30am

Yes there is. The bloom filter comes with a creation date. It isn’t a parameter we can change. It is simply the moment the satellite did a copy of the satellite DB and started to create bloom filters with it. The creation date is the copy date.

That date tells your node to not delete the piece that was uploaded 5 seconds ago and not present during the bloom filter creation hours or days earlier. So we can’t just manipulate the creation date because it would wipe recent uploads. We have to accept that the bloom filter comes with some delay. Sometimes the delay is short and sometimes it is a bit larger.

I believe on top of the creation date the node adds a bit of safety threshold to compensate for possible clock drift.

I don’t think that observation is correct but there might be an easy explaination. If the bloom filter was created lets say 15 days ago it would contain pieces uploaded 30 days prio that are not expired by that time. So on SLC it will always look like +30 days compared to other satellites simply because there is almost no other activity on SLC. If I would use SLC for my backups it would overwrite a few files in 7 days and then you might see 7 days old pieces in the trash folder. So the aritifical upload pattern with always +30 days might explain your observation to some degree.

donald.m.motsinger · August 8, 2024, 10:33am

More data

expiration				uploaded	
date		count		date		count

2024-08-09	10502		2024-07-10	868114
2024-08-10	6742		2024-07-11	510635
2024-08-11	16			2024-07-12	0

So it looks only a fraction of pieces get inserted in the piece_expiration.db

edo · August 8, 2024, 11:22am

@donald.m.motsinger, yes, that is interesting. You are counting the pieces in the blob folders to calculate the number of uploaded pieces, right? Can you share the command for that?

edo · August 8, 2024, 11:34am

Hi @Balage76, let me follow up on your findings! The presence of those older files might be due to the fact that the BF isn’t able to achieve 100% cleanup, typically leaving around 10% behind. I believe this is a known limitation.

There was also a TTL bug that caused files to be overwritten, resulting in uncollectable pieces that needed to be addressed by the backup BF/GC process. This could explain why you’re seeing files that are expected to have expired but are still on your node.

The remaining files probably haven’t expired yet and should be collected by the TTL Collector process in due time.

So, it seems we’re currently in a situation where we’re investigating the TTL issues while relying on the GC process to catch up and clean up the remaining uncollected pieces in the meantime.

Vadim · August 8, 2024, 11:34am

Yes all ended successfully for all 4 satellites.

donald.m.motsinger · August 8, 2024, 11:37am

find /mnt/storagenode7/node/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/ -type f -newermt "2024-07-11 01:00:00" ! -newermt "2024-07-12 00:59:59"

Adjust accordingly to your timezone.

It will take quite some time on a big node. I did it on a 2TB node

littleskunk · August 8, 2024, 11:47am

Same query as yesterday but this time the numbers line up almost perfect:

storagenode@pi5node2:~ $ grep uploaded /home/storagenode/logs/storagenode8.log | grep 2024-08-08 | grep "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE" | grep '"PUT"' | wc -l
82061

storagenode@pi5node2:~ $ sqlite3 /home/storagenode/ssd/sn8/dbs/piece_expiration.db "SELECT count(1) FROM piece_expirations WHERE hex(satellite_id)='7B2DE9D72C2E935F1918C058CAAF8ED00F0581639008707317FF1BD000000000' AND piece_expiration>'2024-09-06 22:00:00';"
81693

Now I wonder if the TTL inserts are paused while the TTL cleanup job is running. Because that cleanup job might be the only difference between my numbers from yesterday and today. I did some rclone copy on the same drive yesterday that does slow down the TTL cleanup job.

Edit: Nope that can’t be it. My second node is running the cleanup job nonstop for more than 24 hours and the numbers also line up almost perfect. There is maybe a 5% difference but not the 10% from yesterday. So it can’t be a full stop of inserts.