Storagenode Docker Update: Changes To Binaries Location For Persistence Between Restarts

Alexey · September 27, 2024, 6:26am

This is was my expectation for the solution by the way. Just when you file a bug report, you should not suggest a fix. Because when the customer is saying that there is an issue, they probably right, when they suggest how to solve it, usually they are not.
But I think it wouldn’t possible without changing the updater, I have no idea how to wrap it in another way.
There is a problem that we use a supervisord, which would start both services, and then the updater will check the version and will try to update it, however, if the node would die during the process, the whole container will be finished.

Without change the updater, it will become a very complicated start sequence:

You still need to download both minimal versions, storagenode and storagenode-updater (otherwise the updater wouldn’t have a storagenode binary to compare versions)
You need to start the updater first and do not start the node.
The updater should download a new version and run the node either a previously downloaded version or download a new one, unpack and run it.
The updater should download a new version of itself if needed and exit or respawn itself in the background to continue the startup procedure.
We should configure supervisord to control now both binaries
supervisord has been started controlling both binaries.

So, there would be probably two versions of supervisord configurations - one only with the updater, the second - with both and teach the updater to reconfigure supervisord.
Or, only one, but the first we need to run the updater and it should either respawn itself in a background or exit, then configure supervisord to handle now running processes.

Alexey · September 28, 2024, 3:50am

I think I fixed an issue: fixed exec issue for restricted systems by AlexeyALeonov · Pull Request #25 · storj/storagenode-docker · GitHub

arrogantrabbit · September 28, 2024, 4:01am

Don’t you need to check if the should_update wants to downgrade the node, and prevent it from doing so?

The idea is that downgrade should not be allowed; if the cursor resets then updater may decide to “update” to an old version.

If I misunderstand the shall_update behavior of updater, and it never suggests to doengrade — then it’s good.

Alexey · September 28, 2024, 6:29am

Yes, it will not downgrade. However, there is one more issue - it will forcibly upgrade the node even if it’s not allowed yet.

Fixed there:

Dunc4n1d4h0 · September 29, 2024, 9:44am

Sure, but what if /tmp:

does not have +x allowed?
is cleared f.e. from crontab?

Alexey · September 29, 2024, 10:21am

I think my PR should solve both issues. It’s mostly relay on an how docker handles containers, so /app/bin should be safe - it’s inside the container, and must not have any restrictions from the host there.
Of course, some exotic setup would have an issue to run binaries inside the container. But well, they are - exotic, and the operator must know, how to solve it for their isolated use case.

jensamberg · September 29, 2024, 12:29pm

Why is the binary not in the Docker image?

arrogantrabbit · September 30, 2024, 1:44am

To decouple container updates from storagenode updates. Storagenode updates often. Container — ideally never.

jensamberg · September 30, 2024, 4:32am

That’s sounds strange. The idea of Docker is that the container has everything

arrogantrabbit · September 30, 2024, 4:36am

It does have everything to manage and run the storagenode software. What is strange?

jensamberg · September 30, 2024, 4:45am

I understand that the binary is not up to date or will download a new. So the docker container version is not strongly coupled to a storagenode version

Alexey · September 30, 2024, 4:59am

We want to update nodes in waves to do not shutdown the whole network when a new container is released. The first solution with our own forked watchtower with a random interval updates between 12 and 72 hours was not ideal. We wanted to use storagenode-updater, which follow the rolling update plan on https://version.storj.io, and is used on all other platforms.
Updating the container become a cumbersome task in this case. So we made a light base image, which contains only the downloader and the supervisor with its config. The supervisor run both processes - storagenode and storagenode-updater, storagenode-updater is updating the node and itself, when the NodeID is eligible to be updated right in the container. So the base image is rarely updated (only if it needs security OS patches) or when we wanted to fix the issue with downgrading of storagenode due to a container restart.

EasyRhino · September 30, 2024, 3:30pm

so is the storj version of watchtower truly useless now?

I’m still running it on some or all of my nodes, and would be happy to stop running it.

Alexey · October 1, 2024, 3:32am

It’s used to update the base image. However, you may update the base image manually, but you may miss important security updates. So, I would still run it.

clement · October 1, 2024, 2:12pm

Thanks everyone for your suggestions.

To make @Alexey’s change possible while ensuring persistence between docker restarts, I created a new patch to add a new flag --binary-store-dir to the storagenode-updater.

The docker env equivalent of that flag will be BINARY_STORE_DIR and by default will be set to /app/config/bin. At entrypoint we will copy the binary to the /app/bin and execute it. This should resolve any permission issues.

Alexey · October 2, 2024, 5:39am

I would suggest to merge the PR first

github.com/storj/storagenode-docker

fixed exec issue for restricted systems

storj:main ← AlexeyALeonov:bugfix-binary-dir-for-restricted-systems

opened 03:24AM - 26 Sep 24 UTC

AlexeyALeonov

+38 -32

Related to #23 How to test: * build the docker image: ``` docker buildx …build -f Dockerfile -t node:latest . ``` * Create an image file and mount it with `noexec` to emulate a restricted mount: ``` truncate -s 100M temp.img mkfs -t ext4 temp.img mkdir temp-disk sudo mount -o rw,noexec temp.img temp-disk sudo chown $(id -i):$(id -g) temp-disk ``` ## testing a normal flow 1. Run the container, and wait till the binaries are downloaded: ``` docker run --rm --name test --mount type=bind,source="${PWD}/test-disk",destination=/app/config node:latest ``` 2. Check that there were no permission errors 3. Check that the `test-disk` contains both binaries in "test-disk/bin" 4. Rerun the container; binaries should not be downloaded again, and the container should be running without permission errors. ## storagenode has been updated in the ephemeral storage, but not in a persistent storage 1. download an old storagenode binary ``` wget -O /tmp/storagenode.zip https://github.com/storj/storj/releases/download/v1.111.4/storagenode_linux_amd64.zip mkdir -p test-disk/bin unzip -p /tmp/storagenode.zip > test-disk/bin/storagenode chmod u+x test-disk/bin/storagenode ``` 2. run the container ``` docker run --rm --name test --mount type=bind,source="${PWD}/test-disk",destination=/app/config node:latest ``` 3. check that a new version of storagenode has been downloaded and replaced both in the persistent and ephemeral storages 4. check that there were no permissions errors 5. Check that the `test-disk` contains both binaries in "test-disk/bin" 6. Rerun the container; binaries should not be downloaded again, and the container should be running without permission errors. 7. Make sure, that the old version is updated to a minimum version, if the update to the suggested version is not allowed yet. ## cleanup ``` sudo umount test-disk rm -rf test-disk test.img ```

because it will heal already broken restricted systems, otherwise we would need to have one more release to be able to use a new version of storagenode-updater with this new flag. At this time we may lose these restricted systems due to offline.
I updated this PR to cover all edge cases I think, please review.

I prepared a separate PR to support this new option in the updater, but it can be merged only when this new feature will be 100% rolled out:

However, this new parameter has change nothing in the logic and it will remain the same as in !25 above:

The script will download a minimal version, if the existing binary in the binary storage should be updated,
It will check, is the node eligible to be updated to a suggested version and download it if so.

Removing this check would revert this change:

clement · October 3, 2024, 2:18pm

I just pushed new images containing this patch. Not marked as latest yet:
https://hub.docker.com/layers/storjlabs/storagenode/9109fd2/images/sha256-17c83b5a16a2938364e14e1a0cc62fabcd4a468bc3bab19fe31abe3c361e2a1b?context=repo

Alexey · October 4, 2024, 7:20am

Asked to verify:

waistcoat · October 4, 2024, 4:22pm

storjlabs/storagenode:9109fd2 is working fine on my raspberry pi2 (arm v5) storagenode. Software is run from /app/bin/ inside the docker container rather than /app/config/bin/ on the existing image.