+1 for the answer from Alexey, just adding some more details:
One of the benefit of hashstore is eliminating all the walkers (other one is speed, as it can work on ext4, event with high number of pieces.
As we have a dedicated “metadata database” with all piece ids and size, it’s enough to check it for the used size.
Also: as the records are spread across the database randomly, technically the code just check the beginning of the database, and estimate the full used space…)
But this is not the “used space”, this is the “useful space”. The sum of the pieces what we store.
We have an other metrics: the size of the log files on the disks.
The overhead on select is usually 3-5 % (we used STORJ_HASHSTORE_COMPACTION_ALIVE_FRACTION=0.6, we recently bumped to 0.7. But usually it’s too high.). But the structure of data on select is different (eg. TTL vs non TTL ratio)
In case you use Grafana / Prometheus, this is how we monitor the overhead:
1-(sum by(environment_name, server_group) (hashstore{environment_name="${environment_name}", field="LenSet", db!="s0", db!="s1"}) / sum by(environment_name, server_group) (hashstore{environment_name="${environment_name}", field="LenLogs", db!="s0", db!="s1"}))
But back to your question:
TLDR;
- No, we don’t need the walkers for hashstore
- But we can fix UI. I agree with you, it can be confusing as it doesn’t show the details of
dead bytes which can be deleted by the next compaction. I think this category should also be added.
Just created a backlog item: Display dead bytes of Hashstore on Storagenode console · Issue #7682 · storj/storj · GitHub