Filewalker not running when disabling lazyfilewalker

flwstern · June 19, 2024, 3:34pm

My filewalker is not even starting on some nodes running the following commands:

pieces.enable-lazy-filewalker: false
storage2.piece-scan-on-startup: true

I have over 50tb thats in “trash” but its however deleted but since lazyfilewalker is not able to clear the update trash value and the normal filewalker does not start im loosing alot of storage. What to do? And when will you fix these issues we have?

This node is reporting 10.6TB from satellite but only shows up as 7,94 TB used.

Before all filewalkers worked correctly when will you fix this? And get a permanent solution which doesnt bug out every other release.

Roxor · June 19, 2024, 3:42pm

Do you mean it’s not deleted? Like there’s 50TB of data under the files/config/storage/trash/SATELLITE/YYYY-MM-DD folders? I don’t think the scan-on-startup used-space filewalker deletes trash - isn’t it one of the periodic garbage-collection filewalkers?

Because there were a couple releases that accidentally went forward and backwards versions, around when trash switched to the YYYY-MM-DD subdirectories… some SNOs noticed they had non-datestamped-directory trash that they had to delete manually.

If you check now I’d expect a few 2024-06-?? directories to be active, but all 2024-05-?? directories to be gone (and the old 2-lettered trash directories).

pangolin · June 19, 2024, 3:42pm

For some reason they made the normal filewalker silent, no logs at all. So hard to know if it is running or not.

flwstern · June 19, 2024, 3:46pm

But it makes logs if it get errors or?

flwstern · June 19, 2024, 3:54pm

No i mean its deleted.

The trash is deleted since long time ago, but the node dashboard thinks its still have 3 tb trash since theres a software bug in the lazyfilewalker and it wont update it. Aaand since it wont update it the node thinks it full but it has over 3TB of free storage left to still fill up.

Then on the second picture used space is 7.94 TB but according to satellite im using way more than that. So here i will get the issue that the node can fill the disk to the brink and not understand its full.

All in all its just messy and probably all my nodes has more or less issues with filewalkers since the new releases. And probably everyone else more or less too.

And this make it even harder to keep track on. Like why would you not log filewalker when not using lazyfilewalker?

littleskunk · June 19, 2024, 4:03pm

The normal filewalker always was that way. We didn’t made it silent. We later added more logging for the lazy filewalker but it is a different code path so these additional log lines don’t show up when running the normal filewalker.

There is an easy trick to find out if the filewalker (both normal and lazy) is still running. You can also check if garbage collection is currently running. Guide to debug my storage node, uplink, s3 gateway, satellite and there go for /mon/ps and also useful is /mon/funcs

flwstern · June 19, 2024, 4:06pm

Or you can add logging like the lazyfilewalker ? Or EVEN better fix the lazyfilewalker so it works again!

Fixing the lazyfilewalker should get prioritized IMO.

littleskunk · June 19, 2024, 4:11pm

I am happy to review your pull request. Feel free to add any logging you would like to see.

Works fine for me. I am not aware of any bugs that need to be fixed. Do you have an error message regarding the lazy filewalker failing? With your current config you are not running the lazy filewalker.

Before we can prioritize a bugfix there needs to be a good bug report first.

flwstern · June 19, 2024, 4:22pm

Haha littleskunk are you serious right now? Not aware of any bugs in lazyfilewalker?

"AlexeyLeader

May 7

For recalculation the enabled scan on startup is enough, but to keep databases updated you seems need to turn off the lazy mode."

"
naxbc

May 7

Even with lazyfilwwalker turned off it won’t work.
Enable pieces scan on startup also not working.
I’ve commented both and now I’m just leaving it to run.
I have this issue for almost 2 months now."

"
jammerdan

Alexey

May 7

I can confirm. I restarted the nodes which had enabled scan on startup and all of them updated the trash space.
So even the nodes that had reported as full are now receiving ingress again."

" To fix the current discrepancy the enabled used-space-filewalker on start is enough (you may comment out the option storage2.piece-scan-on-startup: true, because this is a default value).

But to keep the trash usage updated on your dashboard you may temporary disable the lazy mode, i.e. uncomment the option # pieces.enable-lazy-filewalker: false,
So the resulted settings change may look like:"

This stopped working in May.

See above.

I mean no disrespect littleskunk, but you cant be serious that you dont know that the lazyfilewalker got issues around may when you started your big tests with the new software updates i think it started around 1.104.* ?

But now you know is it possible to get this prioritized?

Im not a software developer but a network engineer and a SNO, so no i wont do any pull request. I would if i could.

But since you want alot of storage i am telling you are missing probably 50TB from my end now since the nodes think its still has 50 TB trash which it doesnt. And know i need to change all nodes from lazyfilewalker again since i thought there was a issue with the normal filewalker but it turns out it doesnt log.

littleskunk · June 19, 2024, 4:24pm

Negative. Still the same response. I am missing a bug report that I could prioritize.

flwstern · June 19, 2024, 4:25pm

So this is not a good enough bug report for you? On your own forum from SNOs?

littleskunk · June 19, 2024, 4:28pm

I asked you for the error message when running the lazy file walker. That error message in a github ticket is something I can prioritize.

Ambifacient · June 19, 2024, 4:31pm

Logging for non-lazy filewalker was committed last week:

flwstern · June 19, 2024, 4:33pm

I dont get a error message on lazyfilewalker everything is successful but your code does not update the trash usage database.

Sorry instead of working against me isnt it possible to listen? You are receiving great feedback from a SNO.

littleskunk · June 19, 2024, 4:34pm

Ok I will remain silent then and wait for a ticket that I can prioritize.

flwstern · June 19, 2024, 4:42pm

Very well i did my best, the problem lies in your code it doesnt update the trash usage database like the normal filewalker. There is no error messages i can give you.

github.com/storj/storj

lazy trash-cleanup-filewalker does not update the trash usage in the database after emptying trash.

opened 04:41PM - 19 Jun 24 UTC

EmomorfarN

Bug

--------------------------------------------------- GENERAL SUPPORT INFORMATION… --------------------------------------------------- The GitHub issue tracker is for bug reports and feature requests. General support can be found at the following locations: - Storj Community Forum - https://forum.storj.io - File a ticket at https://support.storj.io/ --------------------------------------------------- BUG REPORT INFORMATION --------------------------------------------------- --> **Description**  **Screenshots** ![image](https://github.com/storj/storj/assets/68120186/2aec1258-3f78-4f49-a06b-94b68f53673e) **Steps to reproduce the issue:** Enable lazyfilerwalker Since the introduction of the lazy trash-cleanup-filewalker, it appears that it no longer updates the trash usage in the database after emptying trash. **Describe the results you expected:** I expect the node to update the trash database correctly so the node understands that it has deleted the trash. I have a node which reports 3 tb trash but its gone. And now its "full" while it still has 3TB left of free space.

jammerdan · June 19, 2024, 5:10pm

Are you sure it is not this one:

github.com/storj/storj

Trash cleanup doesn't update used/free space cache

opened 01:36PM - 08 May 24 UTC

closed 01:57PM - 12 Jun 24 UTC

littleskunk

Bug

The new lazy filewalker works great. It moves a lot of data in a short amount of… time with no impact on my success rate. Thank you for all these improvements. There is one small issue. Since the trash cleanup was modified to run the lazy filewalker as well it doesn't update the used space / free space values. So once per week the storage node will move let's say 500 GB out of 5 TB into trash and update the cache. Total space used 5 TB. One week later the node has grown another 500 GB and once again it moves 500 GB into the trash folder. Expected size: 4.5 TB used + 500 GB trash = 5 TB total. Actual size: 4.5 TB used + 1 TB trash. The cleanup job deletes the data from trash without updating the cache.

JWvdV · June 19, 2024, 5:11pm

Well, it’s actually not a bug when the filesystem is that busy to get all uploads on the platter out has no time for ionice processes. That’s the whole point here. Finally those processes are being killed when an update comes in, a reboot or maintenance is required, or whatever. So it never finished, sometimes it’s even being killed and then shows an error.

So yeah, it’s kind of not a bug. But if the process takes too long, its priority should be increased (in Linux actually lowered) to be able to finish.

I’m also still curious, why we need this many filewalkers and can’t merge them in one filewalk a day / two days or something. Especially since the garbage collector files are being stored since the new version. Running this filewalkers sequentially is more performant then running them alongside each other, die to the randomness of IO for meta data.

flwstern · June 19, 2024, 5:13pm

LOL WTF

" littleskunk commented on May 8

The new lazy filewalker works great. It moves a lot of data in a short amount of time with no impact on my success rate. Thank you for all these improvements.

There is one small issue. Since the trash cleanup was modified to run the lazy filewalker as well it doesn’t update the used space / free space values. So once per week the storage node will move let’s say 500 GB out of 5 TB into trash and update the cache. Total space used 5 TB. One week later the node has grown another 500 GB and once again it moves 500 GB into the trash folder.
Expected size: 4.5 TB used + 500 GB trash = 5 TB total. Actual size: 4.5 TB used + 1 TB trash. The cleanup job deletes the data from trash without updating the cache.

@littleskunk Then you knew about this all along? Whatsup with the attitude then? I dont understand why you need to respond in such a unproffesional hostile manner when you bring up stuff on the forums as an SNO.

littleskunk · June 19, 2024, 5:23pm

Take your time and read the entire ticket please. Lets try to keep this professional!