[Tech Preview] Hashstore backend for storage nodes

Ambifacient · January 8, 2026, 6:18am

Just commenting here on some stats on hashstore “overhead”. Using the stbb tools one can do some digging for some stats of the hashtables and logfiles.

I have migrated all my nodes to hashstore several months ago. I have now found that, ignoring the data in the log files reserved for trash, with the default compaction settings (alive fraction of 0.25 and probability power 2), I observe ~12% of the log files are “dead” (i.e. space that can be reclaimed via compaction) on US1 data. Nodes vary from 8% to 14%.

Obviously this can change overtime. I’ve also observed it’s rare for compaction to take more than an hour. Using node logs over 4 months, 1% of the time a compaction will take over 1h, with the vast majority <15 minutes. I am now bumping most of my nodes to an alive fraction of 0.5, seems to have reduced the overhead to ~6%.

It would be nice to have a compaction routine, with a configurable integer M, for which on every compaction run it will compact at least the M most dead log files.