Keeping your node up to date, changes to storage node software! PLEASE READ this thread!

brandon · October 15, 2020, 7:08pm

Hello Storage Node Operators!!

We’ve got some important changes to the Storage Node Software and the network and it’s important that all of our SNOs are aware of and understand these changes.

First, it’s vitally important that your storage node software automatically updates. If you have installed your storage node through our official instructions (either via our Windows installer or official Docker instructions using Watchtower), you have nothing to worry about and your node will automatically update. Once our new Linux installer is complete, automatic updates will also be supported there.

Why is this so important? Here’s an example: in the next deployment, we will be changing how Satellites process order submissions for bandwidth payouts. Storage nodes on version 1.12 or newer already support this new mechanism, and if you are running one of our automatic update mechanisms, you already have 1.12. After our next deployment storage nodes older than v1.12 will no longer get paid for bandwidth!

This type of network change will continue to happen. While we are going to continue to do everything we can to make automatic updates as easy as possible (and again, official installation mechanisms should have this already set up for you!), we need everyone’s help to keep storage node software running the latest stable release.

So! Here are the upcoming network policy changes:

We are going to stop having Tardigrade satellites upload new data to nodes that are two minor releases behind. If you’re running 1.11, but 1.12 is the latest complete rollout (we’re currently rolling out 1.14), then you will continue to get new uploads, but 1.10 and older will not. This is a form of node suspension.
We are going to start disqualifying nodes that are 3 minor releases behind. In the above example, 1.10 and older will get DQed. We will also be ensuring that the standard source code release will simply refuse to start if it’s 3 minor releases behind.

The minor release schedule for us is 2 weeks, so, this means storage nodes have 4 weeks to upgrade to the latest release.

Being able to provide the best service to both Tardigrade users and Storage Node Operators is our top priority, and being able to do so requires that we can continue to improve and upgrade the network in relative tandem.

Thanks for your help!

Footnote: We strongly recommend that Storage Node Operators use our automatic update systems, but obviously can’t require it as storage node software is open source. We don’t recommend building your own system for this, but if you need to, the data found at curl -s https://version.storj.io | jq .processes.storagenode.minimum.version. That is how our software determines what release is the latest stable release (jq .Storagenode is deprecated).

kevink · October 15, 2020, 7:24pm

I understand that having up to date nodes is important and there will be breaking changes in multiple releases, so efforts to keep nodes up to date are required.

Then I highly recommend sending an email to all SNOs!

Well as if they never failed before… (especially on windows lol, I don’t use windows though)
makes this leave an especially bad taste:

another way to get DQed… great At least you’ll have 6 weeks until you get DQed so certainly more manageable than audit failures.

Thanks for the info. I might make myself a notifier from that so I can update my node as soon as the update is available on docker. Kind of forget to check occasionally since there’s some time between the changelog gets posted and the availability on docker.

deathlessdd · October 15, 2020, 7:33pm

There should also be some kinda alert on dashboard since alot of people depend on the dashboard not so much emails most probably dont even use a legit emails cause of privacy issues.

Should say Alert Please update by this date or you will either be Suspended or Dqed.
Since alot of people don’t use this forum.

SGC · October 15, 2020, 7:59pm

tho i understand why and what advantage this model has, i do question it a bit…

in the last week or maybe 9 days or whatever we have had 3 “minor” releases… i know the last one technically wasn’t an update… but i could see how this could quickly run off the rails…

sure if the update schedule becomes a bit more stable, predictable one could even call it…
maybe the last few update have been around the 14 day mark… so maybe it’s fine…

but hey if thats what it takes to get the network running better, then its most likely a good idea… its just sometimes a fine line between between it being the worst best idea ever and the best idea ever.

also… have you guys actually tried to simulate this …
i mean if we imagine like ever so often an update goes wrong after the windows release, then gets rolled back, skipped and we then get a new new version, but i guess thats what the minor minor release numbers are for then…

anyways i really don’t know enough about this stuff to really evaluate it, but i really hope somebody looked it over real well and took into account what would happen if this was on when the last 6 months updates was going on…

because the versions really do seem to jump around a lot… and then stop for like a month and then 3 update in like 2 weeks… that behavior and storj releasing this… kinda makes me feel like if the drunk guy at the airport turning out to be my pilot…

no offence, but are you guys sure you are ready for doing this

Pentium100 · October 15, 2020, 8:17pm

I get an alert when there is a new version available on docker, but now I have also added the .processes.storagenode.minimum.version check as well. In theory I should get two SMS when I have to update.

I do not want to automatically update because I do not want to wake up and see that the node has been offline for many hours because of a failed update. I also do not want to get woken up by a node going offline just because the automatic update software decided that 03:00 on a workday is the best time to update and then ran into a problem.
Also, let’s say it was possible to make a schedule of what times the software was allowed to update, there would be no real way to test if it works other than waiting for an update to come out and seeing what happens, which may not happen at a convenient time.

Is there any way of finding out what the “two minor releases behind” version is? I would set up a third alert that would compare current version with that.

kevink · October 15, 2020, 8:23pm

exactly. and then you might as well update manually if you’re just sitting there watching the automatic update…
I’d prefer an automatic update only to not fall below the minimum version so basically as a backup in case I forget to update manually.
How do you get notified about a new image version?

Pentium100 · October 15, 2020, 8:40pm

My actual script is a bit more complicated to deal with errors, but the basic idea is this.

t=`/usr/bin/curl -s "https://auth.docker.io/token?service=registry.docker.io&scope=repository:storjlabs/storagenode:pull" | cut -d'"' -f 4`
/usr/bin/curl -s -H "Authorization: Bearer $t" https://index.docker.io/v2/storjlabs/storagenode/manifests/latest| jq '.history[0].v1Compatibility | fromjson |.created' 2>/dev/null

It shows the time of the latest docker image (right now it’s "2020-10-10T21:59:00.263587582Z"). When it changes, I get an alert.

I now added another script:

/usr/bin/curl -s https://version.storj.io | jq .processes.storagenode.minimum.version

When that changes I’ll also get an alert.

I guess the only difference would be if I was unavailable or forgot, but then maybe I’ll just run the older version for a week until I’m available.
I guess automatic update at the last time just to avoid DQ would be preferrable (well, the node will be DQd anyway, nothing to lose).

SGC · October 15, 2020, 8:45pm

can’t i just turn off watchtower and turn it on when i want it to update…

yeah the idea that it will eventually default to automatic update to survive would make perfect sense…
i would without a doubt have that setup… i mean it’s not always it goes bad… besides a failed update doesn’t crash the server, so it can still call for help if the node is offline due to a failed update… or try other stuff such as reboots

Alexey · October 15, 2020, 8:48pm

Yes. Just do not run it too late. It will check for updates in a random interval between 12 and 72 hours after start.
But such action would not make more sense than a manual update with the risk to be DQ if you do not update in time

SGC · October 15, 2020, 8:51pm

i’m not adverse to automatic updates… i just like to be able to fix stuff when it goes wrong… thus a bit of manual control… but automatic if nothing else works… or is done…

Pentium100 · October 15, 2020, 8:52pm

That’s bad, I might as well write my own script that does not have this delay.
I would love to have the “minimum non-DQ version” available on the API or something. I could use that to automatically trigger an update script (because I took too long to update manually).

Alexey · October 15, 2020, 8:54pm

Could you please create a feature request instead?

joesmoe · October 16, 2020, 3:34am

Aren’t the updates suppose to be built into the docker image soon enough? Which would make this conversation irrelevant.

I think DQ’ing for 3 versions behind is okay, but as mentioned before, we went through 3versions in like a week or something…

Pentium100 · October 16, 2020, 4:57am

That will be fun - the node causing its own downtime at random.

andrew2.hart · October 16, 2020, 5:23am

I am a little annoyed by this, however, I am running your watchtower now.

This means there is nothing left for me to do…

SGC · October 16, 2020, 5:28am

so i got to thinking, this means that the storagenode dependencies can also get storagenodes DQ, like say if my OS, docker, watchtower, linux updater, aren’t up to the latest and the greatest on the storagenode, it will be DQ, where exactly do we find the required versions of all these dependencies… in one location please.

with maybe a bit of a schedule of what versions will be required for what stuff in advance, can’t just expect us to update our OS on a brief notice or such…

i so think all of this is going to explode… i’m just going to have to stay exactly a week behind the first updaters to avoid you guys usual issues with updating, and then not to far behind so that i don’t get destroyed…

almost like a game, but also does kinda makes me think even more that this seems like a way to avoid paying people, this seemed so relaxed at first, and as time goes it’s like we are getting boxed in to possible DQ no matter where we look…

i really need to start looking into doing more projects parallel with this…
there i said it… but i’m sorry but i cannot help but feel that people get DQ for nothing some times… the forum are full of people not knowing why they got DQ.

sure quick updates can help speed up the development, i can’t help but think that one day this network is going to crash hard, not because i think this might be what kills it… but there just seems so many things that could go wrong and the more i learn the more avenues for that to happen i seem to see…

maybe it’s just me not understanding it

nerdatwork · October 16, 2020, 6:05am

How did you get to that conclusion ?

deathlessdd · October 16, 2020, 9:25am

I think you need to stop an take a deep breath an then read the post, an then not to write a wall of text that no one wants to read your every thought process on updating a node and OS docker updates…

SGC · October 16, 2020, 10:51am

i did and last night i was reading a thread with somebody that got suspended because his docker version was outdated, so having a specific location where we could see the required software versions we need to have installed might be prudent, since we now get DQ for not keeping up to date…

Lets see…

that we are allowed to have as many ip’s as we like to just one server, which i guess could be mitigated, but still seems like a one way for stuff to run off the rails…
because i most likely don’t really understand enough about how erasure coding works
because science and solar flares…
lets call this one natural disaster, might not be an issue in the later stages of the network, but for now i think they could still pose a threat, maybe not for a complete destruction of the network… but at the very least cause data to be inaccessible when combined with all the other factors in play.
that storj labs trend towards DQing nodes seems to be ever increasing, allowing data to be simply trashed from the network, to my mind seems like working towards losing data.
that a large majority of SNO’s are focused in certain geographical locations… like say germany
that it’s quite common for one to see people on the forum getting DQ right and left understanding why, which will wear out the supply of potential SNO’s eventually
that now they will start to DQ nodes that doesn’t keep inside a semi tight margin of version on multiple pieces of software
that the coordination of the project seems fairly chaotic and often stuff is released without proper verification that it’s stable and working

don’t really think it will be one little thing that crashes the network, nor do i think it will be permanent, but at one point something big will happen and then maybe combined with a few of the minor things, and it all crashes, because big parts of a nation or the world goes dark due to lets say a bit solar flare.

it’s a good system and a good idea and maybe a future crash could be averted, i just don’t feel thats where the network is heading with storj labs requirements getting more and more intensive all the time and DQ lurking around every corner.

even forgot about satellites which is ofc another single point of failure atm…
and i’m sure the people that really understand the network could come up with many more way in which it could be exposed for conditions that could make it crash, and should be able to see the same movement towards the ledge that i do…