Update Process (Filewalker) + When to update

HeroHann · November 11, 2020, 8:13am

Hello,
I have some questions regarding the update process of the docker image.

I am updating manually as it gives me more freedom to update when I have time to maybe also troubleshoot my nodes. From the thread about updating I am getting, that checking for updates every 2 weeks should be fine.

Can someone tell me more about the file walker process which allegedly runs after every update? What is it doing, how much RAM needs it, is it checking every file? How is the file walker handling network drives? (I know not recommended, deal with it ;D)
At the moment I am updating when I really have to because I don’t want to stress the systems too much.

Also: This command: curl -s https://version.storj.io | jq .processes.storagenode.minimum.version gives me 1.16.1 @ the moment? What does this mean? Isn’t it the latest version available and not the minimum?

Thanks!

PS: I already read this thread: Keeping your node up to date, changes to storage node software! PLEASE READ this thread!

andrew2.hart · November 11, 2020, 10:20am

I think you are on your own there because you are running an unsupported configuration but there may some other ninjas who run it like you.

You should update to 1.16.1, it has gone to all the windows and now to docker too.

HeroHann · November 11, 2020, 10:27am

Thanks for your answer!

I know, that network drives are “unsupported”, but my question is more in general.

What does the file walker do? Is it running after each and every update? Is it running in between updates?

What about this command? Why is it returning the actual version and not the minimum?

Toyoo · November 11, 2020, 10:50am

OP’s mistake was admitting they’re not following the “official” procedure. Now we won’t get any answer for the actual question. Instead, we’ll get ten answers telling you how bad it is to not use the official updater.

Sorry for your loss, HeroHann.

HeroHann · November 11, 2020, 10:52am

I still have hope ;D

Pentium100 · November 11, 2020, 10:57am

I think it only checks the metadata or presence of files, because on my node it usually only accesses the cache.

I also update manually, but I do it whenever there is a new version available. I think it ok to “stress the system” - better find out that a drive is failing quickly than wait until multiple drives have failed. I also run zpool scrub once a month (it is so by default and I have not disabled it) - again, better to find out that a drive is dying sooner rather than later.

HeroHann · November 11, 2020, 11:15am

Thanks for your reply. Have you ever monitored the reads and writes on the drives after an update? Does the file walker have Abigails impact?

I usually wait for a week or so after a new version is available due to new bugs in new versions.

Can you tell me anything about this curl command?

Stob · November 11, 2020, 11:31am

I’ve been after some more information on this too as my drive seems ‘busy’ for a week after an update or storagenade service restart.

drive.io.

The fact it’s not actually reading or writing much suggests it is checking metadata or file presence like @Pentium100 suggests.

The curl command just reads in the json data from the https://version.storj.io response. Try it in a browser.

HeroHann · November 11, 2020, 11:37am

Thanks to you!

Yeah it looks like the file walker is not “doing” much but the drive is at 100%. Metadata as you said. Will try with my network drives.

To the command: Than I don’t get what this should be telling me. I thought it would tell me the last supported version, so I know when to update. But instead it tells me the newest version.

Where do I get the info, which is the latest supported version from which I really HAVE TO update?

Stob · November 11, 2020, 11:47am

I don’t have the official answer but it wouldn’t surprise me if 1.16.1 is now the latest and only supported version due to the recent order errors and fixes from the previous 1.15.3 version.

@Alexey might have the official answer.

nerdatwork · November 11, 2020, 1:21pm

When processes.storagenode.minimum.version == latest version you should update else you know the PS part of your post.

HeroHann · November 11, 2020, 1:25pm

Maybe I don’t get it or I am just dumb.

When do I have to update?
I am running 1.15.3 now and 1.16.1 is out (some people are having problems).
When is my time to absolutely do the switch?

PS part?

nerdatwork · November 11, 2020, 1:29pm

Since you are not relying on automatic update then you should update when the rollout is 100% i.e. all nodes can update GUI/docker/others.

Post Script.

HeroHann · November 11, 2020, 1:47pm

Yeah I SHOULD update but as you might have seen in the forum some people are running into problems. So with everything I own I wait some days / weeks to update if it is not absolutely necessary. Thats why I ask not when I should but rather when I MUST update.

SGC · November 11, 2020, 3:33pm

yeah same here… delayed my update just to wait and see if more problems start flooding the forum.

within the next month or so… i think the rule is 3 versions behind… i forget if thats called major or minor… so on 1.15.3 the latest you should updated would be 1.17.x… maybe 1.18.x i suppose that depends on how the 3 version difference is counted… i mean 1.15.3, 1.16.1, 1.17.x would be a range of 3 versions from when it was updated. making 1.18.x when one would get suspended… while if one says 1.15.3 + 0.3.0 then it would be 1.18.x that would be the last and 1.19.x that would get one suspended…

but i would personally try to stay close to the most recent rather than lagging behind…

haven’t really looked into all that because i don’t plan on getting that far behind.

if there isn’t any major issues for people that keep popping up on the forum i expect to update in the next few days.

and was it a suspension tho… or was it a DQ i forget… like i said, you don’t want to test that stuff really, especially if the punishment was DQ… i remember getting a bit insulted by it, so may have been DQ

but not really relevant because i don’t plan on finding out where that limit is… even if i may skip 1 version if it turns out to be problematic… ofc skipping one update makes one’s node slightly different from the norm and might in itself cause problems long term… in theory atleast… but who knows…

the filewalker has so far as i’m aware been running on every reboot of a node… not sure if that got changed, i have complained a bit about it.
the filewalker is very iops dependent, thus the slower your storage solution iops the longer it will take.
my 14.5tb node takes a few hours usually, so it’s really annoying to troubleshoot…
but i got large scale caching and high iops so not to bad… but i have done testing running on 1x hdd iops, 2x hdd worth iops and 3x worth of iops and it basically scales linear.

so in theory on a system of say 30TB with 1x 7200 rpm hdd iops and no caching the filewalker seems to take upwards of 8 hours or more, tho there are some peak utilization in graph… but it still takes a long time to finish…

there really isn’t any fix for it, aside from caching or having good read iops.

a poorly configured raid array of 30TB filled with data doing the filewalker might take a day or more to complete…

Pac · November 11, 2020, 7:58pm

Well, it says on dashboards that 1.13 is the minimum version:

I read somewhere that the oldest version would only have a limited support, as in you wouldn’t receive anymore data or something like that… Can’t find it back though.

@HeroHann I think you could delay your updates a bit, for instance you could be one version behind. But I wouldn’t go beyond that, as StorjLabs clearly want all nodes to be as up-to-date as possible.

SGC · November 11, 2020, 8:15pm

yeah when you go below the minimum required version ingress will go to zero.

Doom4535 · November 12, 2020, 5:54pm

The filewalker is a horror on my SMR drives; especially if you are trying to do anything that might cause a large amount of writes (I changed some BTRFS mount settings to disable CoW, and needed to ‘apply’ to the existing data by copying to a new spot or running a defrag), while drives slowed down to < 10 MB/s total throughput… Finally finished… The cumalitive load of everything was not good; long story is definitely avoid SMR for StorJ if you can, although you can run it… A RAID might help, I haven’t tested that.

Pac · November 12, 2020, 6:02pm

That perfectly describes what it does to my SMR drives too ^^’
It will become worse and worse the more data disks are storing by the way, maybe something should be considered to make this process lighter on drives!

Having several nodes (on several disks obviously) does help too, as it spreads loads on all disks.

SGC · November 12, 2020, 6:05pm

actually raid setup with SMR would be about the same… since it’s the iops that’s the SMR HDD limitation and with a standard raid array the iops is the same as with a single HDD due to all raid drives writing in harmony with each other or whatever one wants to call it.

the best way to utilize SMR drives will be using multiple smr hdd on multiple nodes and thus the data sent is split across them… ofc that doesn’t make the filewalker any less of a problem, tho since each drive would essentially only have half the data then it should, atleast in theory take about half the time.

so if you have a number of SMR drives making multiple nodes is a pretty good fix, tho when they are full the problem will be the same in regard to running the filewalker…

raid only increases the write and read bandwidth, not the iops… sadly… would be wonderful if it did, but yeah bandwidth does increase, giving you HDD MB/s * (Number of drives - Redundancy Equal Drives)
so in a raid 6 with 6 drives you get 4 x base speed because 2 of the drives are the redundancy.

plainly put, as always it’s a bit more complex than that.