As you can see, more than 3000 empty folders remain in the old blobs directory after the system migrated to the new storage format in hashstore. Essentially, all the data has already been transferred, but the old directory structure remains untouched.
Why Does This Happen?
Apparently, when migrating to the new storage format, the system only transfers files but doesn’t remove the old directory structure. This is typical when updating storage software, where developers prefer a safer approach - not automatically deleting old structures in case a rollback is needed.
How Does This Affect Performance?
Even if directories are empty, they still create a significant load on the file system:
Increased metadata operation time: each directory, even empty:
Requires a separate inode in the file system
Takes up space in the directory table
Is processed during listing and search requests
Approximate load calculation:
Each directory traversal operation (find, du, backup) must process 3077 extra elements
For standard search operations, this can add tens of milliseconds of delay
With a large number of requests, these delays accumulate
Load on the directory cache:
Linux caches directory information in the dentry cache
3000+ directories occupy valuable space in this cache
They displace current data, leading to additional disk accesses
Performance Test
I ran a simple test on my node to measure the time of basic operations:
# Search time in a directory with the old structure
time find /storage/blobs/ -type f -name "*"
real 0m1.843s
# The same on a clean test directory with the same number of files
time find /storage/test/ -type f -name "*"
real 0m0.327s
The difference is more than 5 times! And this is just for one search operation.
Should I Clean It Up?
If the system is stable on the new storage format and you’re confident that a return to the old format isn’t planned, I recommend the following:
Make a backup of the metadata (directory list)
Remove empty directories using a script:
find /storage/blobs/ -type d -empty -delete
Important: Perform this during a period of low load and have a rollback plan in case of unforeseen circumstances.
Questions for the Community
Has anyone else encountered a similar issue?
Are there official recommendations for cleaning up old directories?
What other side effects might occur during such cleanup?
I would appreciate any information and experience!
I do not see a problem, the one traversal will be cached in memory and there likely will not be any difference.
If you still concerned you may delete empty folders. However if you would disable the migration to hashstore the node will try to use a piecestore backend and I do not remember, would it re-create the needed folders struct or it’s need to be re-created with a storagenode setup command.
I’m also slowly migrating to hashstore and on bigger nodes, to delete empty blobs directories after the migration, it can even take a whole day.
In my case the FS is ext4 and listing one of the empty blobs directories can easily take a minute for example. This looks to be some feature of ext4, where if the directories initially contained a significant amounts of files which were then deleted, it causes a significant slowdown.
This was on a system with plenty of RAM, but after the migration none of the blobs directories were cached, so doing any operation on them was very expensive.
I would suggest to modify the code to delete the directories as they get migrated to hashstore.
They should be kept if you decide to revert back to piecestore, and if they are only created at setup. I recently reverted back to piecestore, because hashstore dosen’t give me any confidence. Maybe many will follow.
So, at least until hashstore becomes a must, they shuold be kept, or the node should recreate them when needed.
Again, negligible amount of RAM necessary to cache this when they’re empty.
Though, please consider that hashstore is still not considered production-ready, and it would be nice if the production version cleaned them up, if just for sake of easier troubleshooting in future.
I tried counting the number of folders on ext4 — the process ran all night and still didn’t finish, so the total number of directories on the node remains unknown. The first count was done on xfs, where I at least got some results. In the end, I wiped everything, keeping only the blobs folder. By the way, I noticed that transferring the node with hashstore is much more convenient and faster — the data structure there is clearly better optimized for such tasks.