Tuning the filewalker

jammerdan · August 4, 2022, 6:59am

The thing is, as SNO I don’t want to make up my mind about this kind of stuff and a lot is beyond my control anyway:
I have no control when an update happends → File walker starts
I have no control how much uploads or downloads are ‘forced’ on my node from the satellites resp. customers
Depending on the computer on which the node is running, I have no control what the other software is doing
Or maybe I am simply required to do some other resource intensive stuff.

And then comes the file walker and starts a massive scan over terabytes of data without any consideration if it is appropriate or even required.
I certainly would not want to loose ingress or egress because file walker consumes my disk bandwidth.

Pentium100 · August 4, 2022, 9:13am

I run the node in a VM. The node can take as much CPU as is allocated to the VM and can load the disk IO, since the pool is primarily used for Storj. My monitoring scripts take up some CPU time, so if the node was looking for load to be zero to start the filewalker, it would never start.

The filewalker could just run at the lowest io priority (ionice), it would have the least impact on the performance of the node.

I am a bit biased here though, since the filewalker does not really have an impact on my server.

jammerdan · August 4, 2022, 10:42am

With the file walker set to idle, maybe. But there is plenty of rooms to choose between running on idle and running when the machine is totally overloaded. Currently the file walker would run no matter what the load is.

SGC · August 4, 2022, 1:17pm

the ionice thing is pretty amazing… it can slow it down so much that it nearly is paused.
its basically what zfs does to solve the same issue.

instead of the filewalker or scrub takes 50% of the resources it will be relegated to only get 10 io…
so less than 10% maybe even 2.5% if we assume the HDD could manage 400 io.

i think that is the superiour solution.

jammerdan · August 4, 2022, 2:32pm

Some potential issue just came into my mind when the age is set as interval: It could again result in multiple instances of file walker running at the same time after a reboot:
Let’s say you have multiple nodes on one computer. And you have set the file walker interval to different ages to prevent them from running at the same time. But if all nodes get over the max age while running and then SNO performs a reboot, at next startup every node would perform a startup file walker when it comes back online.
That’s not a good outcome.
So I guess the solution must be that the interval time count starts fresh after a restart of the node.

SGC · August 5, 2022, 8:57am

thats a good point, one could have some sort of start time based on the “unique” node creation time, so that their filewalker doesn’t run at the same time… or it could just be a random kind of deal, ofc with both solutions there might be rare cases where the filewalkers would overlap.

so maybe not the best solution…
but not sure what else to do without having some top tier controlled method…

hmmm maybe we can get the storagenode update system to control the filewalker…
i mean the updates are already running staggered…

maybe we can piggyback on to that, and simply make it so the filewalker is done per storage node update and then maybe weekly or monthly after that…
incase that one day the updates slow down…

that could work i think, and should be fairly easy to implement…
any thoughts on this idea @littleskunk

Toyoo · August 5, 2022, 9:12am

Revisiting this topic, I’ve noticed that you’re giving these numbers. Maybe I forgot about them before, sorry. I do not observe that the file walker takes so much time. Here it was ~12 minutes per terabyte (with cache cleared), which got down to ~8 minutes per terabyte with -I 128, and even less than that without clearing cache. So, more than an order of magnitude less than yours.

Yet Stob’s observation suggests the file walker process also hits >24h times.

I now wonder whether there are any common qualities between your and @Stob’s nodes that make the file walker so long.

It would also definitelly help if we could have more precise measurements of the time consumed by this process, as opposed to guessing it from IO usage graphs.

littleskunk · August 5, 2022, 9:21am

These are all great ideas. Just open a pull request to show us how easy to implement it is.

In the meantime we will just merge that one pull request that was submitted.

SGC · August 5, 2022, 10:31am

yeah ofc i will with my programming experience extending to limited bash scripts.

i mean it has to be easy to do… the filewalker already runs at startup and if it can be disabled.
then having it run only after updates should be fairly straight forward, and since it would run at updates then it would automatically run staggered, because the updates do that.

the basics of the entire thing is already there…
and just fyi it gets old real fast when developers just tell us to program it ourselves…

littleskunk · August 5, 2022, 10:38am

So next time I just leave you alone in the thread and we don’t even get the first pull request? What do you expect me to do here?

jammerdan · August 5, 2022, 10:39am

I’d gladly help out with some Basic:

10 PRINT "HELLO WORLD"
20 GOTO 10

RUN

SYNTAX ERROR
SYNTAX ERROR
SYNTAX ERROR
SYNTAX ERROR
SYNTAX ERROR
SYNTAX ERROR
SYNTAX ERROR
SYNTAX ERROR
SYNTAX ERROR
SYNTAX ERROR
SYNTAX ERROR

:rage: :rage: :rage: :rage: :rage: :rage:

SGC · August 5, 2022, 10:57am

i’m not a developer, i was expecting some sort of feedback, if the idea could be a viable solution.
from what i understand it should solve all the issues and not entail much work.

but again not a developer so i’m sure there could be considerations that i’m not able to take into account.

sorry for disturbing you.

still think i’m on to something with this tho…
and this filewalker thing has been a pain for SNO’s since V3 started.
just trying to provide good ideas, so we might one day have a good solution for all involved.

odarriba · August 5, 2022, 11:00am

Please, let’s keep the conversation healthy and productive.

There are a lot of things which may look “easy to do” because the logic behinds seems like that, or because other softwares do the same, but the reality of the source code is different.

In this case, the filewalker is a linear process executed at its max speed.

Been able to control the speed/intensity of that process probably means rebuilding it from scratch.
Been able to schedule it in the future depending on a timestamp, or every X days probably means adding a lot of code to convert it form a regular function to something more close to a cron task.
The updater is a separate process, that’s needed for how it works and make it very difficult to know if the node is executing something internally.

Ideas are totally welcomed in a conversation like this, but please do not asume if something is easy or not. Also, take in mind that Storj developers are not working just for the storage node (and I’m completely grateful for the three people involved in reviewing the pull request to take time to review it and left comments).

littleskunk · August 5, 2022, 11:12am

The storage node itself doesn’t know if it is started after an update, reboot, system crash or manually. That doesn’t mean anything because my strength is more the code that is already written and not so much the code change needed for this. It might still be a relative short code change.

Same for the ionice level. It can be set in golang. I haven’t seen it in our code and so I am unable to provide any code examples. It might also be a relative short code change.

SGC · August 5, 2022, 1:18pm

thank you.

yeah, but if one did so the default was the filewalker turned off, and then the update of the storagenode software turned the filewalker back on temporarily, so that next node startup it wouldn’t run again.

then the filewalker would end up running staggered like the updates, i know it’s a bit of a patch work method, but thats to limit the complexity of the solution concept, to make it easy to work with.

and like i stated before there are some issues another one i just realize is that, if the filewalker was stopped for whatever reason, then it won’t have finished… which might be why it always runs currently.

to ensure that it never ends in a state where its unaware of the used capacity.
but thats ofc just me guessing.

stuff gets complicated so quickly and often there are reasons things are done a certain way to begin with…

i’m sure if there was an easy solution it would have been done already… or most likely…
its not always easy to see the forest for trees

littleskunk · August 8, 2022, 10:45am

The L2ARC seems to help a lot. GC runtime is down to a few minutes. I think the main difference is that the L2ARC survives a reboot.

SGC · August 8, 2022, 12:46pm

yeah the L2ARC is pretty amazing for all kinds of things, mine takes on avg 5-10% of the ARC IO
which would have had to come from HDD instead, also isn’t an insignificant amount… because its comparable to 1/5 or 1/10 of my total HDD IO during sequential reads.

and what the L2ARC mostly deals with is random IO, also a lot of configurations one can do for the L2ARC but i generally just keep it at the defaults.

i found that changing the logbias from latency (default) to throughput has amazing results also.
using the commands
zfs set logbias=throughput poolname

display the current setting with
zfs get logbias poolname

and returning to the default
zfs set logbias=latency

i have seen near 10x better performance in some cases, ofc it does increase the latency of the pool, but really for 10x more throughput… then running it on latency just makes it so much slower that everything takes 10x the time and then more work just piles up…
so i rather take the latency penalty and run max throughput.

haven’t really looked at the changes in the latency from doing this tho…
seems to work amazing, but only been using it for maybe 3 months… so i might still run into issues in the future… sometimes it takes a while to find the downsides.

also keep in mind these are rough numbers and haven’t investigated it well…
i think when i saw the 10x results it was on a windows vm, so can’t say if its 10x for storj data… however i initially started using it on storj which was why i ended up trying it on the windows vm disk.
It has become my defacto default setting for all zfs pools, so far with no noticeable ill effects.

oh yeah and remind me again… what is GC runtime?
think you might have told me before, but i can’t remember what it is.

BrightSilence · August 8, 2022, 1:39pm

It’s garbage collection

odarriba · August 8, 2022, 2:35pm

The pull request have been merged!

On the next release we will be able to control the initial filewalker process with:

storage2.piece-scan-on-startup: true|false

on the config!

littleskunk · August 8, 2022, 4:29pm

Sorry for the extra work with getting the pull request merged. We are currently chaning our build system and the unit tests are not running as stable as they used to be.