When a new version comes out… it looks like a couple nodes initially get it (as expected)… most stay on the previous version (as expected)… and some go backwards a version (unexpected)?
For example: when 1.101.3 came out… I saw my nodes slowly upgrade over time from 1.99.3 (I think there were a couple 1.100’s in there: but that version got pulled?). As of yesterday all my nodes were 1.101.3 (I still have browser tabs open showing them). Yet this morning I see a few of my nodes (5%?) on 1.102.3 (which is fine), but many of them went backwards to 1.99.3 (about 15%?).
So I went from 100% 1.101.3… to 15% 1.99.3 / 80% 1.101.3 / 5% 1.102.3 (rough numbers)
Should the auto-upgrade feature ever take a node backward a version?
I think I saw this moving from 1.95.x as well, but didn’t record anything, and assumed I was crazy. Note: some nodes get restarted if their Internet connection flakes: so they may have restarted-to-recheck-versions… but I haven’t checked if all the ones that downgraded did get restarted.
Update: I think the downgraded nodes do correspond to ones that were restarted. And the script I have fixing things does a docker stop/remove/compose and not a stop/compose. So it’s entirely possible the system is doing this to itself by wiping out any intermediate versions so upon restart the node only thinks it should be at the minimum version
I don’t really know what I’m talking about: but it feels like the upgrade logic says “If you’re not on the list for the highest version… then use the lowest version” (and ignores any newer-than-minimum version in-between, so it rolls it back)?
I don’t know exactely how update works, but I only saw downgrades when I removed and recreated the container, because I changed some parameter. So if you just restart the machine, I expect not to be downgraded, but only restart with the same version or newer. This is my logic, but maybe I’m wrong.
Yes I just had to make an edit to my initial post: the script I have repairing the node is removing them (which is wrong). So I can understand why fresh nodes may be at a lower version.
Marking this is resolved, since apparently I was shooting myself in the foot. Leaving it up for others to find in the future
This was seen after 1.97.2 was starting to crash nodes after updating. Storjlings downgraded the minimum allowed version to stop the rollout. Any other time if you stop/remove/start the container emphasis on “remove” it should NOT downgrade your node’s version. I think the logic to revert node back to the allowed minimum version is still implemented which is why your node(s) were downgraded. I would wait for Elek to give his expert comments/solution.
I saw the exact same behavior today. All my nodes were 1.101.3 and after a server restart I had some with 1.99.3, some with 1.101.3 and some with 1.102.3…
(All docker nodes)
Same thing happened to me. According to my notes, My docker node was running 1.101.3 on April 17. At some point in the last couple days it downgraded itself to v1.99.3. Like other people in this thread, I have also recently restarted my node after changing a setting.
Does the built in node updater or watch tower generate any logs that would help explain what is going on?
The watchtower would update only a base image, which is rare, but not the node inside. The container will download a new version accordingly version.storj.io and cursor in it, so I guess @Mitsos is right.
Pinged the team.
No, they are not. Only in case of emergency, which is rare.
This is an excellent comment, and full comment is an excellent problem description.
Before 1.102 1.101, accidentally the 1.101 1.100 rolled out for 6% of the nodes. It didn’t have the GC fix, so the rollout has been stopped. But instead of directly starting rollout of 1.102 1.101, for a shorter period of time, it was reverted to the original version.
We will do our best to avoid similar downgrades in the future.
I have serious doubt abouts that number, since all my 14 nodes were updated to v1.101.3. That is technically possible if only 6% were updated, but at a chance of 0.00000000000000078% I would consider that a statistical impossibility.
Do you know why the minimum version wasn’t bumped on https://version.storj.io/ to v1.101.3 before rollout of v1.102.3 started, because this is what’s causing the rollbacks now on many nodes that restart. Seems like a mistake.
It was reset though. The cursor is now 3fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff. So anyone who isn’t selected for v1.102.3 yet will be downgraded to v1.99.3 on a restart of the docker container.
It is my understanding that this version isn’t used to show what’s mandatory, but I’m not entirely certain. Last time I heard about it the version for which ingress stopped wasn’t visible here.
I agree, this should be managed by you guys. There are definitely scenarios where a downgrade (when intentional) should be able to propagate to all nodes. But this one is extra painful as nodes which migrated to date based trash folders will be downgraded to a node version unaware of those. This means that 1.99 won’t clean up trash in date based folders and likely the migration won’t work again when upgrading to 1.102, because the file signaling that date based trash is used already exists. Leaving trash generated by 1.99 in the mean time to possibly stick around forever.