This is was my expectation for the solution by the way. Just when you file a bug report, you should not suggest a fix. Because when the customer is saying that there is an issue, they probably right, when they suggest how to solve it, usually they are not.
But I think it wouldn’t possible without changing the updater, I have no idea how to wrap it in another way.
There is a problem that we use a supervisord, which would start both services, and then the updater will check the version and will try to update it, however, if the node would die during the process, the whole container will be finished.
Without change the updater, it will become a very complicated start sequence:
You still need to download both minimal versions, storagenode and storagenode-updater (otherwise the updater wouldn’t have a storagenode binary to compare versions)
You need to start the updater first and do not start the node.
The updater should download a new version and run the node either a previously downloaded version or download a new one, unpack and run it.
The updater should download a new version of itself if needed and exit or respawn itself in the background to continue the startup procedure.
We should configure supervisord to control now both binaries
supervisord has been started controlling both binaries.
So, there would be probably two versions of supervisord configurations - one only with the updater, the second - with both and teach the updater to reconfigure supervisord.
Or, only one, but the first we need to run the updater and it should either respawn itself in a background or exit, then configure supervisord to handle now running processes.
I think my PR should solve both issues. It’s mostly relay on an how docker handles containers, so /app/bin should be safe - it’s inside the container, and must not have any restrictions from the host there.
Of course, some exotic setup would have an issue to run binaries inside the container. But well, they are - exotic, and the operator must know, how to solve it for their isolated use case.
I understand that the binary is not up to date or will download a new. So the docker container version is not strongly coupled to a storagenode version
We want to update nodes in waves to do not shutdown the whole network when a new container is released. The first solution with our own forked watchtower with a random interval updates between 12 and 72 hours was not ideal. We wanted to use storagenode-updater, which follow the rolling update plan on https://version.storj.io, and is used on all other platforms.
Updating the container become a cumbersome task in this case. So we made a light base image, which contains only the downloader and the supervisor with its config. The supervisor run both processes - storagenode and storagenode-updater, storagenode-updater is updating the node and itself, when the NodeID is eligible to be updated right in the container. So the base image is rarely updated (only if it needs security OS patches) or when we wanted to fix the issue with downgrading of storagenode due to a container restart.
It’s used to update the base image. However, you may update the base image manually, but you may miss important security updates. So, I would still run it.
To make @Alexey’s change possible while ensuring persistence between docker restarts, I created a new patch to add a new flag --binary-store-dir to the storagenode-updater.
The docker env equivalent of that flag will be BINARY_STORE_DIR and by default will be set to /app/config/bin. At entrypoint we will copy the binary to the /app/bin and execute it. This should resolve any permission issues.
because it will heal already broken restricted systems, otherwise we would need to have one more release to be able to use a new version of storagenode-updater with this new flag. At this time we may lose these restricted systems due to offline.
I updated this PR to cover all edge cases I think, please review.
I prepared a separate PR to support this new option in the updater, but it can be merged only when this new feature will be 100% rolled out:
However, this new parameter has change nothing in the logic and it will remain the same as in !25 above:
The script will download a minimal version, if the existing binary in the binary storage should be updated,
It will check, is the node eligible to be updated to a suggested version and download it if so.
storjlabs/storagenode:9109fd2 is working fine on my raspberry pi2 (arm v5) storagenode. Software is run from /app/bin/ inside the docker container rather than /app/config/bin/ on the existing image.