Empty Directories Problem in /storage/blobs/: Should They Be Cleaned?

Empty Directories Problem in /storage/blobs/: Should They Be Cleaned?

The Issue

Hello everyone! I discovered an interesting situation on my small node:

/storage/blobs/
----------------------------------------
Total directories: 3082
Empty directories: 3077
Non-empty directories: 5

As you can see, more than 3000 empty folders remain in the old blobs directory after the system migrated to the new storage format in hashstore. Essentially, all the data has already been transferred, but the old directory structure remains untouched.

Why Does This Happen?

Apparently, when migrating to the new storage format, the system only transfers files but doesn’t remove the old directory structure. This is typical when updating storage software, where developers prefer a safer approach - not automatically deleting old structures in case a rollback is needed.

How Does This Affect Performance?

Even if directories are empty, they still create a significant load on the file system:

  1. Increased metadata operation time: each directory, even empty:

    • Requires a separate inode in the file system
    • Takes up space in the directory table
    • Is processed during listing and search requests
  2. Approximate load calculation:

    • Each directory traversal operation (find, du, backup) must process 3077 extra elements
    • For standard search operations, this can add tens of milliseconds of delay
    • With a large number of requests, these delays accumulate
  3. Load on the directory cache:

    • Linux caches directory information in the dentry cache
    • 3000+ directories occupy valuable space in this cache
    • They displace current data, leading to additional disk accesses

Performance Test

I ran a simple test on my node to measure the time of basic operations:

# Search time in a directory with the old structure
time find /storage/blobs/ -type f -name "*"
real    0m1.843s

# The same on a clean test directory with the same number of files
time find /storage/test/ -type f -name "*"
real    0m0.327s

The difference is more than 5 times! And this is just for one search operation.

Should I Clean It Up?

If the system is stable on the new storage format and you’re confident that a return to the old format isn’t planned, I recommend the following:

  1. Make a backup of the metadata (directory list)
  2. Remove empty directories using a script:
find /storage/blobs/ -type d -empty -delete

Important: Perform this during a period of low load and have a rollback plan in case of unforeseen circumstances.

Questions for the Community

  • Has anyone else encountered a similar issue?
  • Are there official recommendations for cleaning up old directories?
  • What other side effects might occur during such cleanup?

I would appreciate any information and experience!

1 Like

I do not see a problem, the one traversal will be cached in memory and there likely will not be any difference.
If you still concerned you may delete empty folders. However if you would disable the migration to hashstore the node will try to use a piecestore backend and I do not remember, would it re-create the needed folders struct or it’s need to be re-created with a storagenode setup command.

I’m also slowly migrating to hashstore and on bigger nodes, to delete empty blobs directories after the migration, it can even take a whole day.
In my case the FS is ext4 and listing one of the empty blobs directories can easily take a minute for example. This looks to be some feature of ext4, where if the directories initially contained a significant amounts of files which were then deleted, it causes a significant slowdown.
This was on a system with plenty of RAM, but after the migration none of the blobs directories were cached, so doing any operation on them was very expensive.
I would suggest to modify the code to delete the directories as they get migrated to hashstore.

They should be kept if you decide to revert back to piecestore, and if they are only created at setup. I recently reverted back to piecestore, because hashstore dosen’t give me any confidence. Maybe many will follow.
So, at least until hashstore becomes a must, they shuold be kept, or the node should recreate them when needed.

All subfolders in blobs folder are created when needed, not by setup. So deleting empty folders will not cause problems.

1 Like

This is less than a megabyte total.

Again, negligible amount of RAM necessary to cache this when they’re empty.

Though, please consider that hashstore is still not considered production-ready, and it would be nice if the production version cleaned them up, if just for sake of easier troubleshooting in future.

2 Likes

I tried counting the number of folders on ext4 — the process ran all night and still didn’t finish, so the total number of directories on the node remains unknown. The first count was done on xfs, where I at least got some results. In the end, I wiped everything, keeping only the blobs folder. By the way, I noticed that transferring the node with hashstore is much more convenient and faster — the data structure there is clearly better optimized for such tasks.

1 Like

yeah migrating a node with hashstore should be wildly better. Doing thousands of files versus millions is a big difference in file system overhead.

1 Like