Hello,
I have been observing the rollouts of the latest versions, and I believe the current pattern may result in unexpected downgrades for some nodes in some scenarios (at the very least using docker).
Note that while this topic is similar to “Are automatic node downgrades expected?”, the purpose is different therefore I thought it might be beneficial to make a new topic.
The pattern that I have observed is the following:
-
Last rollout was for versions 1.99.3 (min) to 1.101.3 (suggested), which completed without any issues and the cursor reached FFFFF… .
-
Current rollout has a minimum of 1.99.3 and suggested of 1.102.3, and currently has a cursor of 19999… .
-
If a container running the storagenode software is recreated (not restarted) for any reason, then by default the docker image will fetch the appropriate version. As the rollout for 1.102.3 has not been completed, there is a high chance that recreating any nodes will fetch 1.99.3 instead of 1.102.3, therefore resulting in a downgrade from 1.101.3 → 1.99.3.
-
This is not an issue for restarting nodes, as they should not explicitly downgrade if the cursor hasn’t reached them, this is an issue for recreating/reinstalling the node software (at least in docker as you do not select any specific version) as they might fetch and run the minimum version resulting in a downgrade from the version the node was running on previously.
This behaviour is something that I have experienced previously, as due to my current setup having to delete and recreate containers might be necessary during maintenance.
Likewise this issue can be avoided by SNOs by not recreating nodes mid rollout.
I wanted to bring this up as from what I understand downgrades of storagenode versions can be problematic as it is something that is not usually tested, and possibly even more so with 1.99.3 and 1.101.3 as they brought changes to the trash and garbage mechanisms.
To avoid this kind of situation, I think it might be beneficial to ensure that the minimum version of any rollout is the suggested (or target) version of the previous rollout. While Im not sure if this is something that could be changed right now, it could be useful for future rollouts (with the exception of planned downgrades due to issues found with new versions).
Apologies if I made incorrect assumptions in some part of my explanation, or if the whole premise might be wrong due to something I missed, Im simply trying to collaborate with the team with something I believe could be helpful.
Thank you for reading.