[Tech Preview] Hashstore backend for storage nodes

Just commenting here on some stats on hashstore “overhead”. Using the stbb tools one can do some digging for some stats of the hashtables and logfiles.

I have migrated all my nodes to hashstore several months ago. I have now found that, ignoring the data in the log files reserved for trash, with the default compaction settings (alive fraction of 0.25 and probability power 2), I observe ~12% of the log files are “dead” (i.e. space that can be reclaimed via compaction) on US1 data. Nodes vary from 8% to 14%.

Obviously this can change overtime. I’ve also observed it’s rare for compaction to take more than an hour. Using node logs over 4 months, 1% of the time a compaction will take over 1h, with the vast majority <15 minutes. I am now bumping most of my nodes to an alive fraction of 0.5, seems to have reduced the overhead to ~6%.

It would be nice to have a compaction routine, with a configurable integer M, for which on every compaction run it will compact at least the M most dead log files.

2 Likes