Storage node issue with versioncontrol, minimum version downgraded as a workaround

We made a mistake with versioncontrol. It only affects users that don’t use the storagenode-updater. If you are using the storagenode-updater or the docker image you should be safe.

Here is what happened:
Up until a few days ago storage nodes have been allowed to run older versions. At some point, an outdated storage node will not get selected for uploads but otherwise still respond to download requests. The satellite will send an email warning. A few versions later the storage node binary might refuse to start. We have this in place to make sure possible error messages on the customer side are not caused by outdated storage node versions. We want to keep the storage node version in a reasonable range and shutdown the nodes that are too old.
Yesterday we deployed a change that now kills outdated storage nodes way too early. We are working on a fix but in the meantime we also want to keep these nodes in the network.

Workaround:
As a workaround we decreased the version in versioncontrol. That should hopefully allow older nodes to come back online while we are working on a fix.

Possible side effects:
We didn’t downgrade the minimum version enough to match the old behavior. If you are running some kind of self-build updater please double-check your storage node version. Also be aware that versioncontrol doesn’t return the recommended version for the moment. It does point to an older storage node version for the moment.

6 Likes

Is one of your running a docker node for the QA satellite by chance? I think we haven’t tested what happens if we would restart a docker node.

It turns out @clement is running a docker node. We need a few more minutes to make sure the workaround doesn’t make it worse.

2 Likes

Docker nodes will downgrade on restart but since there are not a lot of code changes they will continue to work fine. Workaround will get deployed shortly.

2 Likes

My windows docker node seems to be dying every few hours. This is the 4th time I had to restart docker desktop. Current version of node is 1.94.2

Do you have any details about the reason it is crashing?

So far nothing out of the ordinary in the logs.

2024-01-18T14:09:52Z INFO Got a signal from the OS: "terminated"        { "storagenode"}
2024-01-18T14:09:52Z INFO lazyfilewalker.used-space-filewalker.subprocess Got a signal from the OS: "terminated"

This is the 5th restart. I am going to reinstall docker desktop.

Edit:

Confirming node version got downgraded to 1.93.2

all nodes back online, comment deleted :slight_smile: