Debugging space usage discrepancies

Based on simulations (see the Github issue), the usage of bloom filters and delayed deletion shouldn’t cause a problem. Max 0.5%-1% overhead, or even less.

The exception is the nodes with high number of blob files, because there is a size limit on the bloom filters (due to a RPC limit). There is an active planning phase to fix this limitations.

These are estimates based on a simulator. You can help fixing the issue with providing real data. If you have > 15-20M blobs in one satellite folder, you can send me the list of your blobs (here or to marton@ company domain) .

I can compare it with the list from the database, and calculate differences. (Also: It would help us to test improved version of the bloom filter).

You can do it with:

cd storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa
du --all | gzip > ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa.txt.gz

This supposed to be a big file (1-2 Gb), so you may need to upload it temporary (for example to Storj with a free account :wink: )

6 Likes