Debugging space usage discrepancies

elek · January 11, 2024, 4:08pm

Related issue:

{storagenode,satellite}/gc: bloom filters are ineffective with large storage nodes

opened 11:41AM - 11 Jan 24 UTC

Currently we have a maximum memory limit for bloom filters, however that has a s…ide-effect of them being completely filled with nodes with large number of pieces. With simulating the bloom filter effectiveness we can see different behaviours. https://github.com/storj/experiments/blob/main/simulate-bloom-filter/main.go Here's an approximate results for different piece counts and max bloom sizes: ``` satellite add/delete storage-node bloom-size ideal-bloom-size 1_000_000 50_000 1_005_000 583 KiB 583 KiB 2_000_000 50_000 2_005_000 1.1 MiB 1.1 MiB 4_000_000 50_000 4_010_000 2.0 MiB 2.3 MiB 8_000_000 50_000 8_060_000 2.0 MiB 4.6 MiB 10_000_000 50_000 10_015_000 2.0 MiB 5.7 MiB 14_000_000 50_000 14_450_000 2.0 MiB 8.0 MiB 16_000_000 50_000 16_500_000 2.0 MiB 9.1 MiB (almost unstable) 20_000_000 50_000 unstable 2.0 MiB 11.4 MiB 20_000_000 50_000 20_130_000 4.0 MiB 11.4 MiB 26_000_000 50_000 26_300_000 4.0 MiB 14.8 MiB (almost unstable) 26_000_000 50_000 26_140_000 5.0 MiB 14.8 MiB ``` So, if we use the currently calculated optimal size we are going to have a significantly smaller overhead than our false positive rate. Randomizing the seed significantly does clearly help. This however falls apart when the bloom filter is completely filled -- seems to happen around 16-20M pieces for 2MiB and 22-26M pieces for 4MiB. Currently our largest node has 26M pieces, so bumping the bloom size to 4MiB will probably help. We may also want to adjust our bloom filter size calculation to suggest 2x smaller (or 1.5x smaller) bloom size than the theoretical result suggests. Bumping it to 5MiB should solve it somewhat, however we need to be mindful of drpc packet limit, which may need to be changed -- alternatively, we need a new message type to send larger bloom filters. One interesting approach to try is to create bloom filters only for a subsection of piece ID-s, rather than all of them. This should allow to shrink the number of piece-ids put into the bloom filter, at the cost of longer tail to cleanup the thrash. If we split it into two, e.g. only pieces `<0x80...`, then our ideal sizes should be half as they are now. --- ```[tasklist] ### Draft action items - [ ] Adjust ideal bloom filter size calculation to be smaller. We should experiment a bit more, but it does seem like 1.5x smaller size should be safe. - [ ] Increase bloom filter size as much as drpc packet limit allows - [ ] Create new protobuf that allows to send large bloom filters (larger than 4MiB) - [ ] Try piece-id selection strategy. (e.g. what if bloom filter ignored half or quarter of the pieces) - [ ] Add a log/monkit warning when the fillrate of a bloom filter is above 0.95. ```

TLDR; with high number of segments on one node, the bloom filter is less effective as it should be. Fix is on the way.

The easiest way to check if you are affected is counting the blobs in the storage directory ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa or v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa folders (US or EU satellite).

If you have at least 14M blobs (in one satellite folder) and see big discrepancies, feel free to post your numbers as comment on the issue…

Example:

cd /storj/storj01/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa
find -type f | wc -l
3665465

3 million. It should be fine.