Use 10M / 4Mb / 1 hash. And change it to 2Mb. For me it shows that the false-postitivate rate will be increased from 26% to 46%.
Still it will match all the 10M (good fishes), but you will catch more bad fishes.
Let me try an other inaccurate metaphor:
You would like to send a vase to me. You need a box, the vase, and some packing peanuts to fill the gaps…
You need a box which is big enough, but not too big (because it would be too expensive), it should have enough space for the packing materials around the vase.
Bloom filter should be big enough to include the (hashed bits of) items, + some additional space for zeros to decrease false positive matches.
With smaller box, but with the same vase, you will have less space for packing peanuts… which increases the false positive rate (or the chance of a broken vase ;-)) )
Yeah, I had a brain fart with my previous post. Somehow I thought there would be a minimum size required to match all pieces. But it would eventually just degrade into 100% false positives.
The calculator is useful, thanks for that link. But the vase analogy didn’t really work for me. Regardless, I get it now. Though you didn’t mention whether my assumption here was correct.
Nor this question
I’m thinking yes to both, based on what you said and the calculator. But just wanted to check.
I got a BF today, but the result is disappointing. Or will there be another BF on Monday?
2024-08-24T13:03:54Z INFO retain Prepared to run a Retain request. {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-08-17T17:59:59Z", "Filter Size": 35000003, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-08-24T16:01:30Z INFO retain Moved pieces to trash during retain {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 81848, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 37422647, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Duration": "2h57m36.019750481s", "Retain Status": "enabled"}
If the reports from SL are correct, then I have 0.73TB paid data from this satellite. The used-space-filewalker from yesterday reported 11TB data in that folder.
Does that mean the bloom filter was 35MB? Is that the Max size currently allowed?
A 35MB BF for 0.73TB of paid data? Is that reasonably possible? Seems strange.
That’s a lot of pieces. Hopefully the 0.73TB satellite report is just temporarily broken. Are you looking at the report for a single day or the “Average” displayed at the top of the graph?
So it sent 0.2% of your pieces to the trash? That doesn’t seem like very much, but I guess if the TTL collector is deleting files correctly there shouldn’t be much remaining for the BF to delete right?
I don’t currently run any SLC nodes. Is all SLC test data set for a TTL of 30 days? Just out of curiosity, Have you looked inside one of your blobs/xx folders for SLC to get an idea of how old your pieces are? Perhaps you could check to see if most of the files in one of those folders are less than 40 days old?
Because it includes the TTL pieces which were not cleared yet on the satellite’s side.
It should be improved with the latest change, but I would expect the effect only on the next week.
Likely yes, if it’s significantly differ from a previous report (you should pay attention to the daily reports, not the average used space).
Yes, it’s expected behavior. However, since not all TTL pieces were registered in the TTL database, these orphaned pieces still should be considered as a garbage and should be excluded from the BF independently of were they deleted from the satellite’s databases or not, because they are expired already and should be removed anyway.
After the latest change
the GC should remove these orphaned TTL pieces too.
I do not have an access there, but I believe it struggles with these deletions of millions of segments, otherwise the GC would collect much more expired data, which was not collected by the TTL collector.
It did make a big jump, going from 60 GB to 400 GB—definitely progress! But when you stack that against the 2 TB of uncollected garbage, it still feels like there’s a bit of catching up to do. Do we expect the upcoming BFs to be even more efficient at taking out the trash, or was this the expected big cleanup BF already?
It’s both. We implemented a change, which allows us to enable a feature flag to do not add an expired TTL data to the BF, so it would be moved to the trash on the nodes, and the BF should become more efficient (we always working on improvements in this area).
The latter means that these sporadically performed audits from the trash doesn’t happen so often to assume the bug somewhere. However, I would prefer to be more careful there, as some SNOs empty the trash from time to time.
Just did a deep dive into the logs for this node, and a few things jumped out at me:
There are lots of retain errors popping up over the last few days. Plenty of warnings about files not being found—sounds like a classic case of ‘ghost pieces’ hanging around.
2024-08-22T08:54:29Z WARN retain failed to trash piece {"Process": "storagenode", "cachePath": "config/retain", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Piece ID": "6YPK6D4E4ZBQIJXNP6CZBFISX7IRHBVJTGP43NBCBZXXRNHHMQPA", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat:124\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:340\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).TrashWithStorageFormat:406\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:422\n\tstorj.io/storj/storagenode/retain.(*Service).trash:428\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces.func1:387\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*TrashHandler).processTrashPiece:112\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*TrashHandler).writeLine:99\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*TrashHandler).Write:78\n\tio.copyBuffer:431\n\tio.Copy:388\n\tos.genericWriteTo:269\n\tos.(*File).WriteTo:247\n\tio.copyBuffer:411\n\tio.Copy:388\n\tos/exec.(*Cmd).writerDescriptor.func1:578\n\tos/exec.(*Cmd).Start.func2:728"}
I also spotted something quirky: the ‘Failed to delete’ messages are showing a negative number. I’m guessing that means it tried to delete a piece that didn’t even exist? How could this be the case?
2024-08-25T00:52:44Z INFO retain Moved pieces to trash during retain {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 23678, "Failed to delete": -23678, "Pieces failed to read": 0, "Pieces count": 5633273, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "17m24.819728138s", "Retain Status": "enabled"}
I could not find a “Moved pieces to trash during retain” log event for SLC yet, while the process started a while back ago. Does it means it is still running? Fingers crossed more data will be trashed!
2024-08-25T03:38:27Z INFO retain Prepared to run a Retain request. {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-08-19T17:23:28Z", "Filter Size": 9708635, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}