why process the older BFs first, followed by the newer ones?
@toyoo responded to this and @Ambifacient showed how each filter deleted pieces.
Doesn’t this approach lead to unnecessary and avoidable I/O operations
We recently changed to processing 1 bloom filter at a time, from processing 3 concurrently to reduce IO performance impact.
This is available in the latest published version.
The limit seems to be 25 MB atm. However, still to low for the largest available HDDs.
To compute larger bloom filters we need more infrastructure resources and may cause struggles for some storage nodes.
We are trying to find the sweet spot between deleting as much garbage as possible and avoiding issues with their computation and processing.
We will constantly increase the size to find the best value.