Potential GC speedup by reversing lstat and checking Bloom filter?

Toyoo · January 7, 2023, 6:19am

I was casually reading storage node source code, wanting to check the details of bloom filter implementation, when I noticed one thing in the file walker process when doing GC. Currently this process goes like:

For each 2-letter subdirectory of the satellite’s blobs directory… (walkNamespaceInPath)
1. For each file in that subdirectory… (walkNamespaceWithPrefix)
  1. lstat() it,
  2. construct the BlobInfo object, which simply records the piece ID as decoded from the file name, storage version (also decoded from the file name) and the result of lstat(),
  3. call the walkFunc, which for GC is defined here:
    1. check the file’s modification time, as returned by lstat() above,
    2. check if the piece exists in the Bloom filter.

Now, lstat() needs to fetch data from the file’s inode, which is an additional I/O operation for each file. But checking the Bloom filter is just an in-memory operation which costs almost nothing. It would probably make sense to reverse the order of these checks so that lstat() is only called if the piece does not exist in the Bloom filter. And given the expectation that it is rare that a piece does not exist in the Bloom filter, this reversion should actually eliminate most of random I/O done during the GC process. The only I/O left would be:

directory scans, which can be made sequential with defragmentation,
actual garbage file deletions, which ought to be rare.

Now, I don’t know golang enough to be perfectly sure of this analysis. Besides, it’s 7am here, and I haven’t slept this night. But if I’m right, this change should make the file walker process for collecting garbage a breeze. What do you think?

Pac · January 7, 2023, 9:36pm

I think that if you’re right, that’s a brilliant idea

thepaul · January 9, 2023, 6:31pm

Thanks, this is a great idea! It might indeed help a lot. Deletions aren’t quite as rare as you might think (they’ll include any pieces stored which don’t eventually get used in a segment—that is, when your node lost the long tail race, or the uplink couldn’t contact enough nodes, or the uplink died or was cut off unexpectedly while uploading). But still, “number of files to trash” should generally be much less than “number of files to keep”, so this could be really big.

One complicating factor is that, currently, walkNamespaceWithPrefix calls lstat so that it can skip over directories and special files. If we start calling the walkFunc with entries that are named like blobs but are actually directories (or named pipes, or devices, or symlinks, etc), we have to make sure that everything that uses walkNamespaceWithPrefix underneath can deal with that possibility without crashing messily. I think there are only 4 different ways that function gets used, and I think they can all deal with weird files without too much fuss; it just has to be done carefully.

I’ll get this work scheduled!

(Edit: storagenode/gc: check bloom filter membership before calling lstat · Issue #5454 · storj/storj · GitHub , if you’d like to follow along)

Toyoo · February 9, 2023, 11:33pm

Yay, it’s merged!

snorkel · February 11, 2023, 4:19pm

So, in which storagenode version will it be implemented? I want to see the difference.

Toyoo · February 11, 2023, 4:41pm

The commit is listed in this post, so I assume it’s already released.

Roberto · February 12, 2023, 12:01pm

There has been a lot of throwing these days and my little nas hasn’t suffered as usual, on the contrary… I believe your suggestion worked

snorkel · February 12, 2023, 4:19pm

I don’t believe that you running the newest version with that change. I see that is in a pre release mode.

Roberto · February 12, 2023, 5:29pm

I have 1.72.2 in both of my nodes