Disk usage discrepancy?

On Linux, the storagenode is using the OS-provided interfaces for traversing a filesystem: opendir() and readdir() and stat(). This is essentially the same way that OS tools (e.g., du) do it. I’m not aware yet of anyone demonstrating that du -s is significantly faster than the non-lazy storagenode filewalker, under the same load conditions. If that ends up being the case, there may be some deeper magic we can leverage (i.e., maybe if we go lower-level than the Golang standard library and use openat and fstatat directly it would make a big difference? Or maybe reimplement os.(*File).ReadDir() so we can tune buffer sizes to getdents64()?).

If your problem is only with the lazy filewalker: if the lazy filewalker takes more than, say, 5x the time of the non-lazy one then your system is underprovisioned. It does not have enough I/O overhead to support the number of nodes it is running.

On Windows, we may be doing all sorts of things wrong. We have a few engineers who are experienced Windows programmers, but I don’t think we’ve had time for them to take a close look at file traversal and see what we could be doing to make it faster. Or, yes, if we can’t reach the same performance as built-in tools (which seems unlikely), then we could shell out to those tools and let them do the work instead.

I don’t know of any code in the storagenode that purports to prohibit the use of the filesystem cache. It’s possible Golang’s stdlib is doing something, but that seems like it would be absurd. Do you have any hard data you can share indicating this problem so we can examine further?

2 Likes