Storagenode Docker Update: Changes To Binaries Location For Persistence Between Restarts

snorkel · September 21, 2024, 2:46am

He has a point though… a good email to SNOs and a month notice would be helpful for all with important changes. See this case:
https://forum.storj.io/t/spawnerr-no-permission-to-run-command/27962?u=snorkel
What if that guy was in a cruising trip or couldn’t access his nodes for a month? They would all get DQed by Storj’s mods.

Alexey · September 21, 2024, 6:53am

I think we would not change the way of communication.

Maybe only to add an active notifications to the storagenode dashboard, but it is not in a roadmap too. They are almost static or provided by the node itself (like a DQ message or a Suspension message).

no, it’s not. But you may change the binary location in multiple ways:

You may also mount this location from a different drive or even a docker volume. The last one would also solve any permissions issues, I believe.

snorkel · September 21, 2024, 8:16am

These changes that potentialy affect in a bad way the node functionality should be optional, not forced imposed on good old working nodes. Many don’t keep an eye on the nodes or their percormance, don’t reed the forum, and expect the nodes to work, once set up and tuned to their particular enviroment, and not be broken by Storj mods or updates.
This is the second time a change turned on by default negatively impacted the nodes?

The lazzy file walker - a bad and untested addin, that many had to turn it off.
The location of the storagenode binary that already broke some nodes.

Both were annunced by the same person, but I don’t want to point fingers.
Why this trend continuus?
When you make a change, announce it, make it optional, let people test it, and after some time, like months, if all is well and dandy, than you can turn it on by default.

And this thread dosen’t even stay in Announcements category where it belongs?
The title dosen’t draw attention to click on it and to read the thread, also. It shoud begin with “ATTENTION!..”
It happens that I click on almost every thread, that I clicked on it too. Didn’t know it was something important.

clement · September 21, 2024, 9:14am

Thank you for your valuable feedback. I apologise for the inconvenience caused by the short notice.
Before rolling out this change, there were several factors that were considered:

This change fixes a very critical issue
This is basically a change that majority of SNOs have to do nothing.
We stopped supporting watchtower a long time ago after we rolled out the auto-updated docker images. So for most setups, unless the container is manually restarted, they will continue to run the old images if the they’re using the :latest tag. So this post was actually to make the community aware of the change and what to expect.

Alexey · September 22, 2024, 3:36am

Now it could explain to me, why you were need more time for testing.
How do you mount a storage there?

digitalfrank · September 22, 2024, 6:13am

Hi, i would notify to all that i have update the docker on UNRAID and it upgrade of the docker is normally and perfectly. I use the standard configuration for the path and the docker is updates automatically

jammerdan · September 22, 2024, 12:40pm

Can it be the reason for this? All of a sudden I am getting the same error:

Alexey · September 23, 2024, 2:35am

It could be possible, because the image should also got updates for the underlying OS.
However, I tested this new image for ARM64 and it didn’t have this issue.
This is usually happen, if a ca-certificates package is not updated.
The OS is updated by the way: Dockerfile: upgrade distro buster to bookworm (#22) · storj/storagenode-docker@7784e2a · GitHub

And the actual problem in the linked thread is:

thus a solution:

Seems the docker engine may be changed?

arrogantrabbit · September 23, 2024, 4:42pm

I too believe it’s completely and entirely your fault, and the purpose of these comments is not to jump on the banwagon of victim blaming.

Any software update can bring unknown number of new bugs.

If you are blindly updating to the whatever is pushed to latest tag you are choosing to live by someone else’s schedule and accepting that things will sometimes break, regardless on whether there is any communication about it, including for reasons entirely outside of storj — like os changes.

This was already addressed in the comment above Storagenode Docker Update: Changes To Binaries Location For Persistence Between Restarts - #18 by Ambifacient

By the way, this is precisely why storj deploys storagenode updates in waves. And yet you assume that container updates are inconsequential and update the container itself asap. Does not seem wise to me.

If you want more time — don’t update the containers, or any other software for that matter, automatically. Postphone it until you have time to read the release notes, update, and address all the possible fallout or revert.

And yet you still auto update software. I’m baffled we have to discuss this with someone that has been dealing with software “long enough”

It was communicated in the release notes. That you had practically infinite amount of time to read before installing the update. But you chose to blindly auto-update.

TLDR: there can be changes outside of what’s documented, bugs that sneak past QA, and therefore it’s on you to manage updates responsibly and not ingest any new piece of software just because it’s out.

Alexey · September 24, 2024, 6:54am

I think we could do a change to affect less non-default setups, if we would download to /app/config/bin if there are no binaries, but then copy them to some folder inside the container, which is not exposed to the host and where we can set proper permissions, like /app/bin and use it to run binaries from there.

Right now there are three solutions:

Add exec to the mount options in /etc/fstab, if possible (defaults includes it).
Use a docker volume, however, this seems doesn’t work on QNAP for some reason: Node stopped working as docker container on qnap-container-station - #20 by beli
Use the variable BINARY_DIR to redirect the binary folder inside the container.

Roberto · September 24, 2024, 9:03am

Just to let you know, I have a Qnap arm, the TS230, and the container upgrade went smoothly

BrightSilence · September 26, 2024, 12:22am

Please do consider that approach, because this seems to have broken the very default setup on synology I use. I was able to add the permissions manually, but default synology shares seem to not work. The whole point of containerizing software is to make it able to run on a wide range of different environments. Don’t impose this kind of file rights management on SNOs when you don’t need to. Persisting the binaries is fine, but running them from an external file system is asking for issues. Copying them back into the container and managing rights there would fix all those issues.

Alexey · September 26, 2024, 3:29am

github.com/storj/storagenode-docker

fixed exec issue for restricted systems

storj:main ← AlexeyALeonov:bugfix-binary-dir-for-restricted-systems

opened 03:24AM - 26 Sep 24 UTC

AlexeyALeonov

+21 -8

Related to #23 How to test: * build the docker image: ``` docker buildx build …-f Dockerfile -t node:latest . ``` Create an image file and mount it with `noexec` to emulate a restricted mount: ``` truncate -s 100M temp.img mkfs -t ext4 temp.img mkdir temp-disk sudo mount -o rw,noexec temp.img temp-disk sudo chown $(id -i):$(id -g) temp-disk ``` * Run the container, and wait till the binaries are downloaded: ``` docker run --rm --name test --mount type=bind,source="${PWD}/test-disk",destination=/app/config node:latest ``` * Check that the `test-disk` contains both binaries in "test-disk/bin" Rerun the container; binaries should not be downloaded again. * cleanup ``` sudo umount test-disk rm -rf test-disk test.img ```

arrogantrabbit · September 26, 2024, 3:38am

What if updater updates the binary after the container has been started, outside of entry point?

storaje · September 26, 2024, 4:25am

Unix systems executable will point to the file inode. Updating the file changes the inode. The running process will remain pointed to the inode it started with

BrightSilence · September 26, 2024, 4:50am

Wow, barely 3 hours after my post. Thanks @Alexey , that’s awesome!

arrogantrabbit · September 26, 2024, 4:52am

That’s not a concern, and is irrelevant in the context of handling the shadow executables location.

Two possibilities here:

updater updates file in the app/bin location. Who copies it to app/config/bin?
Updater updates file in the app/config/bin location. Who copies it to app/bin?

I.e. updater needs to be aware of these two different locations, or there shall be some wrapper service that wraps the updater and handles copying the executable.

I don’t see it in the PR.

I see updater downloading new image is being only handled in the entry point. But updater runs every 15 minutes. What happens when one of those invocations result in a new image downloaded?

Alexey · September 26, 2024, 5:54am

Depends on which one. I introduced another one - BINARY_STORE, it’s basically a permanent storage on the data location by default. They would be copied when the container is restarted. However, if they are older than the current version, they will be replaced and copied back.
I do not know what would happen, if you put there a more new versions, but it’s easy to check

If you would mount other folder for the BINARY_DIR location inside the container then - depends on the OS. For Linux you may replace it, but likely it will try to run them only on restart, but before that they will be replaced by binaries from BINARY_STORE, which can be updated to the current version if they were outdated (see above).
For Windows I think you cannot replace them, they would be locked, however, I do not know how the 9p protocol (WSL2) or SMB/CIFS (Hyper-V) would handle that.

Alexey · September 26, 2024, 6:00am

It’s running with these parameters:

github.com

storj/storagenode-docker/blob/a15eeb30217573cd4246f192f139688a99cb31bd/docker/entrypoint#L106


      
          
          if [ -n "${LOG_LEVEL:-}" ]; then
            SNO_RUN_PARAMS="${SNO_RUN_PARAMS} --log.level=${LOG_LEVEL}"
          fi
          
          if [ "${SETUP:-}" = "true" ]; then
            echo "Running ${BINARY_DIR}/storagenode setup $SNO_RUN_PARAMS ${*}"
            exec ${BINARY_DIR}/storagenode setup ${SNO_RUN_PARAMS} ${*}
          else
            sed -i \
            "s#^command=/app/bin/storagenode-updater\$#command=${BINARY_DIR}/storagenode-updater run --binary-location ${BINARY_DIR}/storagenode ${RUN_PARAMS} #" \
            /etc/supervisor/supervisord.conf
          
            sed -i \
            "s#^command=/app/bin/storagenode\$#command=${BINARY_DIR}/storagenode run ${SNO_RUN_PARAMS} ${*}#" \
            /etc/supervisor/supervisord.conf
          
            # remove explicit user flag when container is run as non-root
            if [ $EUID != "0" ]; then
               sed -i "s#^user=root##" /etc/supervisor/supervisord.conf
            fi

So, storagenode-updater knows about the binary location $BINARY_DIR and there would be all binaries.
But you are correct, it wouldn’t copy them back to $BINARY_STORE. So issue would be spawned again on the next container restart after the update. If there would be an exception, the container would be restarted and this will fix it. If there wouldn’t be an exception, it will self update again.

What would you suggest?

arrogantrabbit · September 26, 2024, 5:33pm

This is an option:

Or, better yet, the change needs to be made in the updater itself — it would need to update the executable in both places, and then restart the node correctly.

Alternatively, it could write the most recently used node version it ever downloaded to data store, and then on container start update to at least that version. (I.e if cursor and mask indicates lower version — it will update to the last known higher version)

This will solve the original problem without storing executables in the data folder in the first place nor copy them around.