Hashstore - file fragmentation?

karsten · January 1, 2026, 3:25pm

Hej,

If I understood the concept correctly, hashstore uses container files to store the customer data. Those files can be up to 1 GB in size. Is there any mechanism to avoid fragmentation when inflating/deflating files? Wouldn’t it be better to create 1GB files upfront to fill them later?

Regards

Roxor · January 1, 2026, 3:35pm

Hashstore files are tracked to know how much data has been deleted from them… and once the percentage is high enough they get compacted/rewritten. So I believe fragmentation is managed by having those compaction events… as a chance to rewrite large slabs of active data sequentially.

alpharabbit · January 1, 2026, 5:25pm

Then you need to allocate all space from the very beginning, unpaid of course.

Toyoo · January 1, 2026, 7:08pm

These files are written sequentially and never reused for new files. On compaction, a new container file is created and data is copied from the old one, then the old one is removed. Fragmentation should not be a problem on any modern file system/operating system.

snorkel · January 2, 2026, 7:15am

But you have to leave some free space on the drive, in order that the fs/os can play with those 1GB containers and allocate continuous space to each. I wonder how each fs manages a 90% filled drive on hashstore and if ext4 is still the king regarding fragmentation?

Toyoo · January 2, 2026, 3:22pm

Shouldn’t be difficult to make an experiment. Want to try?

snorkel · January 2, 2026, 7:41pm

Go for it…
I won’t.

Roxor · January 2, 2026, 7:48pm

I don’t remember caring about fragmentation since… maybe when 500GB HDDs were new?

A node on a fragmented filesystem… is still so much faster than the average Internet connection it’s attached to - it’s not worth worrying about.

snorkel · January 2, 2026, 11:34pm

but is not faster than an unfragmented node, against which it competes.

Ambifacient · January 3, 2026, 6:43pm

iirc the piece need not be fsync’d to disk to win the race.

arrogantrabbit · January 3, 2026, 10:52pm

Average segment size is under 4k. Default allocation size on ext4 is 4k. Therefore, fragmentation is irrelevant, literally. On zfs situation is even better.

Stop worrying about fragmentation. It does not matter.