Hashstore - file fragmentation?

Hej,

If I understood the concept correctly, hashstore uses container files to store the customer data. Those files can be up to 1 GB in size. Is there any mechanism to avoid fragmentation when inflating/deflating files? Wouldn’t it be better to create 1GB files upfront to fill them later?

Regards

Hashstore files are tracked to know how much data has been deleted from them… and once the percentage is high enough they get compacted/rewritten. So I believe fragmentation is managed by having those compaction events… as a chance to rewrite large slabs of active data sequentially.

1 Like

Then you need to allocate all space from the very beginning, unpaid of course. :money_mouth_face:

These files are written sequentially and never reused for new files. On compaction, a new container file is created and data is copied from the old one, then the old one is removed. Fragmentation should not be a problem on any modern file system/operating system.

1 Like

But you have to leave some free space on the drive, in order that the fs/os can play with those 1GB containers and allocate continuous space to each. I wonder how each fs manages a 90% filled drive on hashstore and if ext4 is still the king regarding fragmentation? :face_with_monocle:

Shouldn’t be difficult to make an experiment. Want to try?

Go for it… :sweat_smile:
I won’t.

1 Like

I don’t remember caring about fragmentation since… maybe when 500GB HDDs were new?

A node on a fragmented filesystem… is still so much faster than the average Internet connection it’s attached to - it’s not worth worrying about.

1 Like

but is not faster than an unfragmented node, against which it competes.

iirc the piece need not be fsync’d to disk to win the race.

Average segment size is under 4k. Default allocation size on ext4 is 4k. Therefore, fragmentation is irrelevant, literally. On zfs situation is even better.

Stop worrying about fragmentation. It does not matter.

1 Like