Two weeks working for free in the waste storage business :-(

Zetanova · July 17, 2024, 7:15pm

@elek I made my own small research about the filewalker and ext4 & co issue for the last past days and got related to your commit. Best filesystem for storj - #89 by Zetanova

Even your badger cache code will not solve the underlaying issue of directory fragmentation.

A refactor of the directory layout would be required including your badger as an index-cache.

My idea would be to change the “./config/storage/blobs/” directory structure from a hash-set to a journal. This requires to include a continuous directory like a date into the
path. The “./config/storage/trash/” already makes it similar approach.

This will force new writes to be put near together and in the same inode structure
and will reduce the directory fragmentation. The downside would be that a direct access over the chunk hash will not be possible and this would be where your badger cache would be used as the index and file path locator.

Example:
instead of “/config/storage/blobs/6r2…aa/a2/fj4…cq.sj1”
following path could be used:
instead of “/config/storage/blobs/{yyyy-ww}/{bucketId}/fj4…cq.sj1”

The chunk hash, modified date and bucketId can be looked up over the badger db
and located on disk. The BucketId would be counter to only to store a max count of chunks files inside a directory.

The badger db could be recreated at any time by firewalking, but this would be much faster because the directory inodes and files reads will always be near more sequential on the drive.

On node startup only the lastest directory “/config/storage/blobs/{yyyy-ww}/{bucketId}/” need to be checked. Because only it could contain any unindexed changed files.