Just commenting here on some stats on hashstore “overhead”. Using the stbb tools one can do some digging for some stats of the hashtables and logfiles.
I have migrated all my nodes to hashstore several months ago. I have now found that, ignoring the data in the log files reserved for trash, with the default compaction settings (alive fraction of 0.25 and probability power 2), I observe ~12% of the log files are “dead” (i.e. space that can be reclaimed via compaction) on US1 data. Nodes vary from 8% to 14%.
Obviously this can change overtime. I’ve also observed it’s rare for compaction to take more than an hour. Using node logs over 4 months, 1% of the time a compaction will take over 1h, with the vast majority <15 minutes. I am now bumping most of my nodes to an alive fraction of 0.5, seems to have reduced the overhead to ~6%.
It would be nice to have a compaction routine, with a configurable integer M, for which on every compaction run it will compact at least the M most dead log files.