Your TTL DB doesn’t contain any old entries. So we know the TTL cleanup is working as expected and not falling behind.
This leaves 2 other options on the table. The used space numbers on the dashboard are off or the TTL DB didn’t persist all the uploads in the first place.
Something I find strange is the low numbers you have in that DB. For comparison here is my node:
2024-08-21|777680
2024-08-22|1622570
2024-08-23|103405
2024-08-24|93795
2024-08-25|98219
2024-08-26|104809
2024-08-27|102702
I would suggest to calculate the size of the subfolder pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa in your storage/blobs folder.
Yes, but in the beginning the randomize of pieces names were not enough, so part of the data was overwritten, that’s mean, that it should be collected by the garbage collector.
Or your node might have had a database is locked issue that time, when this data is uploaded. However, I guess, that this data should be collected by the GC anyway.
Do you have BF in the retain directory in the data location?
My nodes are still processing them one by one and four more:
There can be more than 1791 pieces. It cached in memory first then flushed to the database. It’s unknown amount of missing records unfortunately.
So, seems the size of the BF
is not enough for your node to collect all the garbage…
Are you sure the report is valid? Do you really have the files? If the database includes space usage of deprecated satellite, the tool may not give reliable numbers…
This isn’t really possible. The earnings calculator only includes satellites with activity for that month and everything is then joined to those active satellites. Even if the data for inactive satellites is still in the db’s, it wouldn’t be included in the calculations. And if for some reason it is included, you would also see a record for that satellite in the bottom overview, which is not the case here.
It seems I have the exact same issue on my nodes: It seems TTL did not get deleted, while there are no apparent errors.
In the screenshot below, you see I store about 8TB of data (right side, reported by satellites and having no gaps), while I have 15.1 TB occupied on the hard disk (left side). So half of my capacity is occupied by trash it seems.
I also wonder when trash will be released by the BF. It looks like the BF from SL are not GC much lately:
I’m here to just inform you that I have same issue. If you need data pls give us a simple list of commands and I will post the results.
If not… just informed you that donald is not alone.
Can we please stay on topic and focus on how to move forward in solving this issue? It’s important to address the problem at hand and explore potential solutions. Let’s work together to find ways to improve and optimize the system. Your constructive input is appreciated!
Every file should be accounted for, my original node from the start has many files that were last accessed 2019. This seems to be forgotten data that’s taking up 9TB that I could be earning money on but instead my disk is full.
The bloom filter tells your node what data is alive from the satellites point of view. Although this comes with a delay and a false positive rate of 10%, no ‘forgotten’ data should survive forever.
It just dropped to ~250GB used on 29-07-2024 from the satellites. I still have 3.3TB used on disk. I’ll update if that number goes back up, but 250GB paid data out of 3.3TB is crazy! ZFS agrees that 3.29TB is used in that dataset.
How exactly does the Bloom filter work? Is it sending a filter that includes all the pieces that should be stored and deleting everything that isn’t in the filter? Or does it work the other way around? I know BFs can have false positives, but not false negatives.
The Average Disk Space Used graph isn’t always reliable because some satellites can lag in reporting the used space back to the node. In my case, as you can see from the screenshot, the SL and US1 satellites haven’t reported any data in the last few days. You can verify this on your dashboard by selecting the SL and/or US1 satellites, which will likely show 0 space used.
Based on your screenshot, a more plausible situation would be that you have around 1.7 TB of stored data. This suggests that there may be approximately 1.8 TB of uncollected garbage stored, indicating a 50% loss in available resources, similar to what I’m experiencing.
As more Storage Node Operators (SNOs) report similar issues, it’s becoming clear that there’s a challenge with cleaning up used space, possibly related to the cleanup of expired TTL data. The key question remains: When will “Uncollected Garbage” be deleted?
For example, when I look at my graphs for the SL satellite only this month, my stored data dropped from 8.3 TB to 2.4 TB, a decrease of 5.9 TB. However, this data hasn’t been removed from my hard disks — I verified this by checking the actual disk usage rather than just relying on the node’s calculations. Over the last couple of weeks, garbage collection has only cleared about 1.5 TB, leaving 4.4 TB still unaccounted for.
Looking at the Bloom Filters (BFs) for SL, the last significant cleanup event was three days ago, and only 100 GB of garbage is currently in the trash folders. I’m wondering when I can expect the remaining 4.4 TB to be cleaned up by the BFs?
Let’s continue to share observations and work together to resolve this issue. Any insights or updates from the team on this matter would be greatly appreciated!