Disk usage discrepancy?

Ah, if we’re talking about the manipulation of the files, then yes, as you rightly point out, we have lots of room for improvement. I was only talking above about the directory traversal piece by itself.

Your suggestions are good ones and some we’ve talked about doing before. Some are harder than others. For example, giving up on the temp dir would mean we need some other way to differentiate partial uploads from full, intact blobs. We could try to use a database to keep track of what files are partial and which are not, but we’ve had lots of problems depending too much on sqlite databases or other single-file dbs. We could use different filenames, but then the rename() step still needs to do multiple writes (it does a link and unlink, unless Linux is able to overwrite a dirent with a new filename now). Also the directories would have a lot more entries in them to go through, which could hurt the performance of GET operations.

This is probably true, but we haven’t yet found the ideal way to do this that (a) doesn’t involve basically reimplementing the filesystem; (b) doesn’t hurt performance (both TTFB and throughput) for GET operations; and (c) still provides ranged reads and so on. We’ve had some promising results with experiments using LSM tree storage, but there are still big challenges to get over.

Regardless of all that, there’s a lot we can do, and as you note, it’s just a matter of assigning resources to the problem.

I think this information can be put together; nodes send build hashes with their telemetry data. If we collected all the hashes of released builds together with the target platforms, we’d at least have a good idea of the percentage of Windows nodes. But that would take a nontrivial amount of work. I don’t think we have anything better available, at least within the sphere of the project I’m familiar with. Downloads on Github is probably as good as anything. That would be one of the first things to fix once someone can be dedicated to improving node performance.

3 Likes