I have over 50tb thats in “trash” but its however deleted but since lazyfilewalker is not able to clear the update trash value and the normal filewalker does not start im loosing alot of storage. What to do? And when will you fix these issues we have?
Do you mean it’s not deleted? Like there’s 50TB of data under the files/config/storage/trash/SATELLITE/YYYY-MM-DD folders? I don’t think the scan-on-startup used-space filewalker deletes trash - isn’t it one of the periodic garbage-collection filewalkers?
Because there were a couple releases that accidentally went forward and backwards versions, around when trash switched to the YYYY-MM-DD subdirectories… some SNOs noticed they had non-datestamped-directory trash that they had to delete manually.
If you check now I’d expect a few 2024-06-?? directories to be active, but all 2024-05-?? directories to be gone (and the old 2-lettered trash directories).
The trash is deleted since long time ago, but the node dashboard thinks its still have 3 tb trash since theres a software bug in the lazyfilewalker and it wont update it. Aaand since it wont update it the node thinks it full but it has over 3TB of free storage left to still fill up.
Then on the second picture used space is 7.94 TB but according to satellite im using way more than that. So here i will get the issue that the node can fill the disk to the brink and not understand its full.
All in all its just messy and probably all my nodes has more or less issues with filewalkers since the new releases. And probably everyone else more or less too.
And this make it even harder to keep track on. Like why would you not log filewalker when not using lazyfilewalker?
The normal filewalker always was that way. We didn’t made it silent. We later added more logging for the lazy filewalker but it is a different code path so these additional log lines don’t show up when running the normal filewalker.
There is an easy trick to find out if the filewalker (both normal and lazy) is still running. You can also check if garbage collection is currently running. Guide to debug my storage node, uplink, s3 gateway, satellite and there go for /mon/ps and also useful is /mon/funcs
I am happy to review your pull request. Feel free to add any logging you would like to see.
Works fine for me. I am not aware of any bugs that need to be fixed. Do you have an error message regarding the lazy filewalker failing? With your current config you are not running the lazy filewalker.
Before we can prioritize a bugfix there needs to be a good bug report first.
Even with lazyfilwwalker turned off it won’t work.
Enable pieces scan on startup also not working.
I’ve commented both and now I’m just leaving it to run.
I have this issue for almost 2 months now."
I can confirm. I restarted the nodes which had enabled scan on startup and all of them updated the trash space.
So even the nodes that had reported as full are now receiving ingress again."
" To fix the current discrepancy the enabled used-space-filewalker on start is enough (you may comment out the option storage2.piece-scan-on-startup: true, because this is a default value).
But to keep the trash usage updated on your dashboard you may temporary disable the lazy mode, i.e. uncomment the option # pieces.enable-lazy-filewalker: false,
So the resulted settings change may look like:"
This stopped working in May.
See above.
I mean no disrespect littleskunk, but you cant be serious that you dont know that the lazyfilewalker got issues around may when you started your big tests with the new software updates i think it started around 1.104.* ?
But now you know is it possible to get this prioritized?
Im not a software developer but a network engineer and a SNO, so no i wont do any pull request. I would if i could.
But since you want alot of storage i am telling you are missing probably 50TB from my end now since the nodes think its still has 50 TB trash which it doesnt. And know i need to change all nodes from lazyfilewalker again since i thought there was a issue with the normal filewalker but it turns out it doesnt log.
Very well i did my best, the problem lies in your code it doesnt update the trash usage database like the normal filewalker. There is no error messages i can give you.
Well, it’s actually not a bug when the filesystem is that busy to get all uploads on the platter out has no time for ionice processes. That’s the whole point here. Finally those processes are being killed when an update comes in, a reboot or maintenance is required, or whatever. So it never finished, sometimes it’s even being killed and then shows an error.
So yeah, it’s kind of not a bug. But if the process takes too long, its priority should be increased (in Linux actually lowered) to be able to finish.
I’m also still curious, why we need this many filewalkers and can’t merge them in one filewalk a day / two days or something. Especially since the garbage collector files are being stored since the new version. Running this filewalkers sequentially is more performant then running them alongside each other, die to the randomness of IO for meta data.
The new lazy filewalker works great. It moves a lot of data in a short amount of time with no impact on my success rate. Thank you for all these improvements.
There is one small issue. Since the trash cleanup was modified to run the lazy filewalker as well it doesn’t update the used space / free space values. So once per week the storage node will move let’s say 500 GB out of 5 TB into trash and update the cache. Total space used 5 TB. One week later the node has grown another 500 GB and once again it moves 500 GB into the trash folder.
Expected size: 4.5 TB used + 500 GB trash = 5 TB total. Actual size: 4.5 TB used + 1 TB trash. The cleanup job deletes the data from trash without updating the cache.
@littleskunk Then you knew about this all along? Whatsup with the attitude then? I dont understand why you need to respond in such a unproffesional hostile manner when you bring up stuff on the forums as an SNO.