Current situation with garbage collection

Please refer to my attempt of explaining bloom filters, maybe it will help here.

A database of all existing pieces. They shouldn’t need to store information on removed pieces.

32 bytes.

Each bloom filter has a chance of garbage-collecting each unused piece. So after a piece is removed, if you miss one bloom filter, the next one will do the job.

Bloom filters do not store information on which pieces should be deleted. Please refer to my post linked above for explanation.

In most scenarios, especially ones where you do not have enough memory to cache file metadata, batch removals are faster. Your node only needs to scan each subdirectory once, as opposed to walking them randomly.

2 Likes