It seems that the new feature “save-state-resume GC filewalker” isn’t functioning as expected

Thinking about it, how exactly will this work? I am not sure if this solves it.
2 painpoints I was thinking of:

  • Does used-space filewalker need to run immediately after a restart? As there are other processes on start as well consuming IOPS, used-space filewalker does not seem to be the most urgent one to run on start-up. So starting it could be delayed by a time set by SNO. This way, if SNO restarts the node within that timeframe the filewalker would not start.

  • Second was that it would start from the beginning every time. This gets solved indeed. But after it has finished and I restart the node, it would start again from the scratch, right? So let’s say it was running for 3 days and finally all data is up to date and when I restart the node on day 4 to increase storage space, then the filewalker starts again despite really being required to run again, right?

I believe I had observed that the filewalker restarts after some time if it has not completed successfully. This is why I have turned it off on some nodes. So node restart not required. But if it keeps running into the same issue over and over without moving forward, then it is a waste of resources and better stop. This could be done with a maximum retry setting or something.

Let me try to explain.
First, with such a periodic run I would be able to catch errors when other processes have issues updating the space like we are seeing currently with the trash.
Second it could also reduce the number of used-space filewalker runs altogether. Because either I can turn them off for running on restart and rely on the periodic runs. Or it runs on startup only if the period set has passed. Currently it runs on every startup (when run on startup set to true).

For example:
Currently, it will run minimum appr. every 2 weeks after each update.
When I turn it off, it will never run.
Now I set period to 3 weeks. I set run on startup to true and it runs once and completes successfully. When I restart 3 days later and again 1 day later it won’t run because 3 weeks since last successful run have not passed yet. Even on next update it will not run because 3 weeks have not passed already. Depending on the implementation it could be made that it runs after 3 weeks automatically or with the next restart after 3 weeks have passed, so it would re-run after the next update the latest.

Other example:
I turn off run on startup. But I set period to 2 weeks. Then I can restart the node as many times as I want, it would not restart the used-space filewalker. Even on update, no re-run.
But 2 weeks after the last successful finish, it would run.

That was my idea. I hope it is clearer now.

2 Likes