Disable filewalker

kevink · November 12, 2020, 8:25am

I’m currently migrating my (degraded) raidz and hitting the IO limit of my 2 remaining HDDs very hard as multiple copying and a resilver attempt are running.
The filewalker of 3 nodes on the same array doesn’t make it easy to do anything within the next few days as it’s reading with lots of 4kB blocks.

Is there a way to stop or disable the filewalker process?

(I just want to get my migration done as quickly as possible. Afterwards I’ll have one node per HDD and will activate it again. But I’m keeping my nodes online for uptime and egress. Should that IO load however continue for another 12 hours, I might take them offline… but I want to prevent that)

Pac · November 12, 2020, 8:36am

Not aware of anyway to disable it.
The filewalker only runs when nodes start (or restart after an upgrade).
So if they are stressing your disk right now, that may be because they just upgraded.
Once they’re done, it shouldn’t happen before the next upgrade (unless you restart them yourself, or there is a power outage…).

kevink · November 12, 2020, 8:38am

I had to restart my PC a couple times so it is still “after startup” but the filewalker processes of 3 nodes take more than a day on this load. So I either have to wait for probably 1-2 days until they are done or shut down the nodes…

Alexey · November 12, 2020, 8:46am

Seems the only way is shutdown those nodes and migrate the RAID as fast as possible.
By the way it’s not a good idea run all nodes on the same array. But I’m sure you already know that and now you know why.
This is just one of the reasons

kevink · November 12, 2020, 8:49am

Ok, if you don’t know a way then shutdown will be the only option left.
I degraded the array intentionally because I actually want to change it to 3 separate HDDs instead of a raidz1.

Alexey · November 12, 2020, 8:53am

No, it’s not implemented as far as I know. So, yes, the only option is shutdown all nodes.
I hope that your nodes have the same public IP, so less lose for the network

kevink · November 12, 2020, 9:00am

The HDDs are fine, I don’t expect problems migrating.
10/12TB is from one public IP.

andrew2.hart · November 12, 2020, 9:43am

I am currently running 10 nodes on 2 disks and there aren’t any obvious problems. The disk activity was high for some hours (described as galloping unicorns) after the upgrade.
Is there anything I should watch for in particular as I am only really doing this as a test…

SGC · November 12, 2020, 4:34pm

from my experience there is a vast difference in migration time between having a node running on a raidz vs having the node turned off.

also i would either resilver or copy the data out from the raidz, doing both is doubling the workload, which more than doubles the time it takes to complete the tasks in parallel instead of sequentially.

so i would shutdown the nodes and resilver because that would most likely make most sense, unless if you want to change something ashift or whatever normally unchangeable in a pool configuration.

if you want to copy the data out, you should be able to use a stop or pause command on the resilvering.
not sure what the command is, but one could always just pull the drive… the correct drive ofc

talked to one of the storjlings about that recently, but never got around to making a feature request for the option to modify the filewalker process, it will run every time you restart a storagenode… then as far as i can tell it won’t run again when done, maybe if a node runs for extreme lengths of time, but i don’t think so… it sounded a lot like it was basically a storagenode boot process.

and since it’s basically what makes sure how much space is taken up by the blobs folder and such… it is kinda important for various reasons, but having it run repeatedly with lots of free space and when trying to troubleshoot or like you migrate… it’s really annoying

you want to free up iops and the more iops you free up, the faster the stuff will go… and it’s not linear, hdd seek time when doing parallel tasks the time required quickly grows almost exponentially.
ofc if you got NCQ a bit of query depth might not be to bad, but haven’t really done any tests on that, its a hdd hardware feature that helps reduce seek time using some kind of predictive algorithm or so.

ofc in your case the l2arc and arc might carry the brunt of the load, but i have little doubt you can most likely complete your migration in 1/5 or less time as compare to running 3 nodes, resilvering and copying data out, if you shut down the nodes and migrate…

but at the very least i would kill the resilvering if you a migrating anyways or go with the resilvering, atleast with only 2 drives the odds of another failure isn’t to high xD

if you are using rsync then maxing out the blocksize might speed things up using the -B parameter
rsync -B 131072
but last time i tried it, i didn’t really see the big difference i thought i got last time… so might not be worth it… seemed to work well the first time i tried it tho… but that may have been because it was a mainly 128k recordsize pool.

and last but not least… what happened?
new drives and new hdd tray bay thingamajig… would have expected that to run for years.

kevink · November 12, 2020, 7:05pm

There is no stopping a resilver. Even after pulling the drive out, it continued the process at 250MB/s even though it was just reading… even a reboot didn’t change that So I just had to wait for it to finish the failed resilver…
Now I’m copying data from it but those small files from the nodes makes it painfully slow… gonna need some days for 14TB…

The simplest of all problems: I ran out of space. So I thought I’d just change my setup from a raidz1 to 3 separate HDDs, giving me another 8TB that was “wasted” on redundancy. All my data is backed up anyway and rarely accessed so why waste precious 8TB that my nodes can use to grow.

SGC · November 12, 2020, 7:28pm

looked into it, apperently if you do a zpool get all
then in the list you will see a features@resilver_defer enabled for your pool.

this to my understanding is that it defaults / defer to resilvering if possible by itself…

you might want to dig a bit into it first, because i’m not 100% sure this is correct, but nor is it really my problem , if it works tho it should solve your problem, tho i doubt zfs will allow you to use the hdd in zfs still… but not sure that can be helped, but atleast then you can make an ext4 partition and use the hdd…

you still might be able to do something like that just putting it behind a USB sata controller.
anyways this command but with disable and your pool name should make it stop resilvering automaticly…

zpool set feature@resilver_defer=enabled zfspool

ofc you will run into a problem because when you have taken 8tb out you still got 8 tb left and no free hdd’s…

is kinda cool how redundant it is in that configuration…

Kevink - ill just swap this around to optimize my data storage

zfs raidz1 pool - f you kevink, stop stealing my drives…

i’ve had quite a few of these fights with zfs already lol, screen punch worthy

kevink · November 12, 2020, 7:32pm

the resilver is done, so it’s fine, not stealing any iops anymore. now I just need to get all the data onto the other HDDs. I will test the theory about putting the old HDD back in formatted as a new zfs pool in a VM first or I’ll just plug it into my rpi as an external HDD. Copying over network should work well enough.

(zfs can be very stubborn sometimes. Doesn’t even let you export the pool while resilvering…)

SGC · November 12, 2020, 7:38pm

yeah but when you think about why it really does all those things, its usually because if done incorrectly you can wreck the pool… it’s to keep people without understanding of the system from wrecking it.

use the GPT partiton or whatever its called name when you assign the disks in zfs pools, then you can rewrite the GPT and zfs will release the drive permanently.

resilver has surprised me many times… i tried to resilver maybe 6 times, even after formatting a hdd zfs just spent like 30 minutes and was like… meh i’m done all fine i guess it recovered the data.

tho i’m sad to inform you that to be 100% on the integrity of your pool, you will need to scrub it also…

duno why, but resilver sometimes lies… not always, but sometimes… so you might be fine… most likely you are fine… i’ve only seen it once in the 5-6 times i’ve resilvered.

yeah the vm zfs test platform is a good idea… duno why i haven’t gotten around to set that up yet… i really should.

took the time a looked for it… been trying to make this work a few times so basically got it down,…

this is the location of the GUID use those, ofc give them some better names first so you can use them for figuring out which disk is disk… i really like the serial id type deal from the by-id folder
but yeah choose some better names for the guid’s

/dev/disk/by-partuuid

or i think this is how… anyways good luck.
since i haven’t tested it yet, there maybe holes in my understanding.

kevink · November 12, 2020, 7:51pm

I wish I created a new GPT partition table, removed the zfs partition, created a new partition… but zfs keeps recognizing the HDD and recreating its partition.

SGC · November 12, 2020, 7:55pm

the /dev/disk/by-id
which i was using and still is using isn’t a partition table, it’s the serial number, so that will never change… but at the time it was what i could get to work…

the /dev/disk/by-partuuid/ names are the GPT partition or the GPT name
thus they are not fixed and can be regenerated or changed…

it’s why other people say it’s smart to use those… because the pool owner retains some basic control of the hdd’s in the system… while if serials and or such are used zfs will fight to the death.

not 100% on how to make it work, but i think it should be by using by-partuuid

kevink · November 12, 2020, 8:11pm

yeah I should have used those but… sometimes you just learn through painful experience

Storgeez · November 12, 2020, 8:48pm

Make a bug report! Congratulations on having a minimum of 20 characters to post.

SGC · November 13, 2020, 5:58am

it’s not a bug, its a feature… lol

Storgeez · November 13, 2020, 7:20pm

What would the feature even be? lol
Maybe the feature is that it has a bug?

kevink · November 13, 2020, 7:25pm

Maybe it is a bug but maybe it serves a purpose after a failed resilver attempt… Doesnt’ matter anymore
Btw: You can get a HDD free from its pool by creating a new pool on it on a different computer, or by exporting the active pool and then creating a new pool on that HDD. But if the pool stays active, there’s no chance to do anything with that HDD, even after reconnecting it after a failed resilver or similar.