Tuning the filewalker

SGC · August 1, 2022, 10:09am

would be nice if one could not only start and stop it, but also choose a priority level.

I got the idea from ZFS scrubbing, which is very dynamic… there is a minimum allowed IO and then the priority of the task is lower than the other active workloads of the pool.

that works pretty amazing, even tho it can take a long time to scrub when the pool is really busy, however it doesn’t introduce much latency and it doesn’t really get in the way of anything.

and it will ofc speed up when there is ample free resources for it… not sure about the exact mechanics of how it’s configured… but might be a good model to borrow the design from.

i don’t think the config flag name is very important, so long as its clearly stated in the documentation when using --help and on docs.storj.io

however storage2.initial-piece-scan is a more accurate name… storage2.initial.check i’m sure could easily be misinterpreted with something else.

so my vote is on storage2.initial-piece-scan

Bivvo · August 1, 2022, 10:14am

Please allow me some specialist idiot questions

Is the initial piece scan also done after an automatic docker node update?

Can the piece scan be started manually?

Wouldn’t it be better to not disable it in the config, but with a docker run flag, which allows a quicker on/off setting by just restarting the node with the on/off flag? (e.g. to manually restart each 3 months or so)

SGC · August 1, 2022, 10:22am

currently the filewalker basically runs every time a node is restarted or updated, which is really annoying… especially when troubleshooting or such.

i do think that if the node is restarted mid filewalker, it will resume… but not 100% sure on that… just looks like that from what i’ve observed.

being able to start it manually would be pretty nice thats for sure, without having to restart the node, ofc i’m not sure if there are other node start features that run, which if there isn’t it does sort of make it irrelevant…

except if we make the example of one running the node without the flag normally, then restarting to run the filewalker and then not getting the filewalker turned off again, thus it would start again upon updates and such… so it would include a lot of turning the node on and off again with different run commands… which would be rather annoying.

you can run storagenode binary flags in the docker run command.

i think those was some good specialist idiot questions. lol
made me think of a few things i hadn’t considered.

Bivvo · August 1, 2022, 10:37am

thank you it might be helpful to add a flag which allows ONE initial filewalker run and then disables it, until the node is manually restarted with the same single filewalker scan (excluding automatic restarts).

odarriba · August 1, 2022, 10:41am

I think that is the current behaviour - on start it executes one filewalker scan and then nothing else until the process is restarted. Updating the storagenode implies a restart too.

Currently not, and implementing that would be a lot more changes. It may be something that could be implemented in the multinode (for example) and triggered via API, but doiung that in a proper way would require to have some insights of how is the process doing (% of completion, speed, etc).

A lot of changes.

Bivvo · August 1, 2022, 10:42am

That’s the point.

jammerdan · August 1, 2022, 10:45am

AFAIK the config gets reloaded with every restart.
So if you first start it with config on an then change the config to off it should not run anymore unless you set it back to on.

jammerdan · August 1, 2022, 10:49am

I would suggest storage2.startup-piece-scan as ‘initial’ could be understood in the ambigious meaning it would only run on the first startup of the node which is not the case. The scan gets performed on every start as I understand it. So maybe even storage2.piece-scan-on-start would be clearer (don’t forget we are non-native too).

jammerdan · August 1, 2022, 11:44am

I wonder if this could be made an int against a counter. A user then could set this to a number after how many restarts a scan will be performed.
This would be very similar to the file system check scan in Linux:

-c max-mount-counts
Adjust the number of mounts after which the filesystem will be checked by e2fsck(8). If max-mount-counts is 0 or -1, the number of times the filesystem is mounted will be disregarded

I don’t know but I bet there is a counter somewhere in the code that keeps track of how many times a node has been started.

odarriba · August 1, 2022, 11:48am

Another option can be to be execurted only if the node has not been cleanly exited.

Personally I don’t know the internals of Storj, nor I’m an experienced Golang programmer, so any of those options are vastly difficult for me

However, having the chance to disable it via config is already a great improvement over having to build it yourself to do that.

jammerdan · August 1, 2022, 11:55am

Who do you tell?
However if there is already a counter in some variable, it should be easy to check it against a number set in config and execute the check only if it fits the criteria.

Alexey · August 1, 2022, 7:15pm

You can provide it as an argument (requires to re-create a container) or update in the config file and restart the container.

jammerdan · August 2, 2022, 4:30am

@littleskunk @BrightSilence
Are you aware if such a counter exists and if it does, how to access the current value?

Alexey · August 2, 2022, 4:41am

Your logs. If you redirected them to the file - you can search for "Node ":

grep "2022-08" /mnt/storj/storagenode/storagenode.log | grep -c "Node "

bota87 · August 2, 2022, 11:00pm

I like to setup and forget, have to periodicaly switch config flag on/off dosen’t seems the best solution.
I suggest to start the filewalker on restart but only if this isn’t run on the last x days.
So we can have 2 flags:

storage2.startup-piece-scan: to enable the filewalker on start (default 1: enabled)
storage2.startup-piece-scan-min-age: to set the days to wait between 2 scan (default 0: scan on each restart)

In this way we keep compatibility with current behaviour

jammerdan · August 3, 2022, 3:58am

Could be a single flag storage2.startup-piece-scan: -1 for disabled, 0 for each restart, and > 0 as days after the last start.

jammerdan · August 3, 2022, 5:16am

Normally the filewalker process should not interfere with the node or server operation.
It should recognize when it is appropriate to run and when not. When the system is already under stress, it should not put additional load on the system.

KernelPanick · August 3, 2022, 10:27pm

I love this idea, but I’ll make a projection here and say this is possible, but would probably require insecure additional permissions to the container (kernel space access on the host to check the physical disk behavior) If there is some user space host app that it could depend on, maybe that could work, but may require additional host configurations which is not ideal either.

jammerdan · August 4, 2022, 4:21am

I don’t know if I can follow that, I am not a programmer.
But I have seen other programs certainly being capable of it. Folding at home for example has some settings when to run.

And in Linux there is cat /proc/loadavg for example which represents the system load.
I think it would be a start if file walker would check such system generated information.

With a Docker running with root privileges this should be doable I think.

Edit: From my understanding the file walker is like a huge loop over all files.
With an adjustable sleep loop between each files, it could be slowed down when required and therefore consuming less resources.
Within the sleep loop it could check the system load and either proceed with the next file or with the sleep.

Pentium100 · August 4, 2022, 6:42am

The load level should be adjustable then, so that everyone could set the maximum load they tolerate.