Multiple Bloom filters per satellite

Maybe I have a misconception how Bloom filters are processed or you are misunderstanding me because this impacts what I am talking about.

My basic understanding is, if my node misses a Bloom filter it does not really matter, because the next time it receives a Bloom filter it processes basically the same pieces. Maybe a different order or something but at the end the result is the same, pieces that shall no longer be on the node will get deleted by one Bloom filter or the other. If that is not correct, then of course my idea cannot work. But it was my impression, that the node can miss Bloom filters.

So when I have two or more different Bloom filters for one satellite it processes all of them one after another, scanning all files for each Bloom filter. And for each Bloom filter of the same satellite, it goes through all files again.

Instead my idea is to discard the earlier Bloom filter and to traverse the files only once using only the latest Bloom filter.

Edit: AFAIK we store the Bloom filters on node disk for 2 reason: To be able to resume after a node restart and to better accommodate slower nodes. Maybe also need to keep in mind that Bloom filter sizes are constantly increasing.
And now you have the same slower nodes, larger Bloom filters and those nodes have to run multiple times through the same pieces. This sounds like a lot of IO that could be avoided.

2 Likes