Nope. Sorry. Really only been doing this for fun and I like the project in theory, but this is very rapidly becoming not fun.
I think you misunderstand the situation. What I’m describing here can ONLY be a bug.
The title of this thread suggests that it’s the lazy filewalker that’s not saving progress, but that’s not what I’m experiencing. Even when the filewalker isn’t running, the Used Space in the piechart on the node page still increases as ingress comes in. It’s seems more or less reasonably accurate as long as you don’t restart. But THAT figure, the Used Space that the node is publishing to the node page sans any completed filewalker, is what is getting deducted from when the node is restarted. Now, what possible legitimate reason could there be for that? My databases are not locking at the moment of shutdown. So, if I shut down the node when that piechart is reporting 11.2TB, and when I restart the piechart is reporting 10.5TB… why is the node not saving that 11.2TB figure from its previous run, and overwriting it with a lower figure? If it is fetching the result of the last successful filewalker or something, that’s terrible design. It should keep the value of the most recent calculation, which would be from the point of the last shutdown. Where it gets this dramatically lower figure on restart from, I have no idea. But simply restarting should not completely overwrite that figure to a clearly lower and incorrect value.
If you’re going to continue to insist that that’s working as intended, that just restarting the node SHOULD lower that used space value dramatically despite there being no db locking at the point of shutdown and nothing discernibly wrong on a node running an absolutely basic, vanilla docker-on-linux install, then yeah, this is no longer a fun experiment.
df is showing I have 1.1TB left free space on my disk, while the node page thinks there’s 1.74TB of free space. Used space filewalker for Salt Lake began 36 hours ago with no visible progress and plenty of idle CPU to spare. It’s a race. If the software can fix itself before this node implodes due to running out of real space, then fine, I’ll let it keep going. If it can’t, and reporting the problem does nothing but elicit “Your equipment must be bad” when I know that’s not the case, then let it burn. Sorry. Had enough.
(I mean, seriously, why would you not have code that runs an instant df and makes sure that the free space as perceived by the node CANNOT POSSIBLY be higher than the free space per df, under any circumstance? We’ve talked about this before, and I honestly cannot understand these design oversights.)