The repair job is currently hitting a lot of storage nodes that have no space left but at the same time still, report free space to the satellite. This can happen if the free space cache is getting inaccurate. I have seen this on my own storage node. The free space calculation is executed only once on startup, cached in memory and we try to keep the cache accurate by increasing or decreasing it with every upload and delete. In the meantime, I have stored about 100GB personal data on the drive that I am using for one of my storage nodes. The storage node was not aware of that and still reporting that it has 100GB available but unable to accept any uploads.
The fix is obvious. Run the free space calculation more than once and if the disk is full “unsubscribe” from node selection. The only problem with that obvious fix is the pure number of affected storage nodes. I would like to know if there are any other ways to trick out the free space cache. Ideally from storage nodes that don’t use the hard drive for anything else. Please check your storage node log for possible out of space errors and let’s try to find out how we managed to get to that point.
Note: In the current release the storage node is not showing the correct free space values. That is a known issue and will be fixed with the next rollout. More accurate is the free space value that you find in the storage node logs on every upload started line.