[Tech Preview] Storagenode Updater inside Docker

Oh no. I did a mistake. Version control is currently pointing to a not existing version. Should be fixed now. Please retry.

1 Like

Getting the same result. Should I be using a new image?

image

Also, this seems like a pretty big flaw in the system. If for some reason the url isn’t reachable or the file doesn’t exist, recreating the container would simply make it impossible for the node to start again. Could a fallback be built in that keeps a copy of the last working binaries in the storage location and reverts to them in case something like that happens? Otherwise I’m really not feeling good about this approach.

I also agree with @CutieePie that this isn’t really how docker should work. You already use a custom version of watchtower, wouldn’t it be possible to make watchtower aware of the latest version have it use the rollout system to determine when to trigger automated updates of the node? I’m aware that this wouldn’t block SNOs from updating earlier, but how many would really do that manually? And if they do, how likely is it that they will do that very quickly?

I just really don’t like that I could end up with a container that is unable to even download the binary.

Ok great. I did the second mistake as well and forgot to publish the v1.50.3 release. Now that is also fixed and I see no more reason why it shouldn’t find the binaries now.

Yes you should set VERSION_SERVER_URL in the run command to point to https://version.qa.storj.io

Any way to opt out of this? It looks bad to me.

Will do! This should probably be added here as well then: Please join our public test network

Tried again and also set the VERSION_SERVER_URL in the run command, but same result.
image

Also tried without the VERSION_SERVER_URL set and get the same error (with different url of course).

Or maybe:

--version.server-url="https://version.qa.storj.io"

There is another option for that which I already set. But the entrypoint script uses the environment variable before the binaries ever run. So for now you have to set it twice. I would suggest using the environment variable for both. But as you can see from the log I posted, setting the environment variable made it use the QA server. But it still didn’t work.

The issue linked above implies that this fixes a problem of being unable to push a fix for bad code to docker containers. Does that mean this updater will only be used in rare cases of known bad code as opposed to a general updater? I would consider this a much better idea than having a self-updating docker container. That would at a minimum increase the difficulty of downgrading.

downgrade is a bad idea. If there are migrations for database - it will not be reverted and your downgraded version would not work.

I’ve been trying to reproduce the issue for hours but still unable to reproduce; at least on docker desktop for Mac.
I’m still unable to reproduce this with both test and production version server URL:

docker run -d --restart unless-stopped --stop-timeout 300 -p 14003:14002 -p 28968:28967/tcp -p 28968:28967/udp \
 -e WALLET="0xXX" \
 -e EMAIL="email" \
 -e ADDRESS="dns:28968" \
 -e STORAGE="1TB" \
 -e VERSION_SERVER_URL="https://version.qa.storj.io" \
 --mount type=bind,source="...",destination=/app/identity \
 --mount type=bind,source="...",destination=/app/config \
 --name nodetestnet \
 storjlabs/storagenode:35efb6462-go1.17.5

The output as expected:

downloading storagenode-updater

Connecting to version.qa.storj.io (35.188.169.133:443)

writing to stdout

-                    100% |********************************|    92  0:00:00 ETA

written to stdout

Connecting to github.com (140.82.121.3:443)

Connecting to objects.githubusercontent.com (185.199.111.133:443)

saving to '/tmp/storagenode-updater.zip'

storagenode-updater. 100% |********************************| 7910k  0:00:00 ETA

...

Anything I can do on my end to help figure it out?

What happens when you change the entrypoint to sh and connect to the QA version server with wget?

This should open the terminal inside the container:

docker run -i -t --name testnetwork --rm --entrypoint sh storjlabs/storagenode:35efb6462-go1.17.5

Connect to the QA version server:

wget -O- version.storj.io/processes/storagenode/minimum/url?os=linux&arch=amd64

If wget fails to connect, install GNU wget and try again:

apk add wget

wget -O- version.storj.io/processes/storagenode/minimum/url?os=linux&arch=amd64
$ docker run -i -t --name testnetwork --rm --entrypoint sh storjlabs/storagenode:35efb6462-go1.17.5
Unable to find image 'storjlabs/storagenode:35efb6462-go1.17.5' locally
35efb6462-go1.17.5: Pulling from storjlabs/storagenode
Digest: sha256:57afdc03a5e31106bc929bd24c201150dcfffecaab68be65c9f45a04e5776940
Status: Downloaded newer image for storjlabs/storagenode:35efb6462-go1.17.5
/app # wget -O- version.storj.io/processes/storagenode/minimum/url?os=linux&arch=amd64
/app # Connecting to version.storj.io (35.224.88.204:80)
Connecting to version.storj.io (35.224.88.204:443)
wget: error getting response: Connection reset by peer
apk add wget
fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/community/x86_64/APKINDEX.tar.gz
(1/3) Installing libunistring (0.9.10-r1)
(2/3) Installing libidn2 (2.3.2-r0)
(3/3) Installing wget (1.21.2-r2)
Executing busybox-1.34.1-r3.trigger
OK: 67 MiB in 38 packages
[1]+  Done(1)                    wget -O- version.storj.io/processes/storagenode/minimum/url?os=linux
/app # wget -O- version.storj.io/processes/storagenode/minimum/url?os=linux&arch=amd64
/app #
Redirecting output to 'wget-log'.
/app #

I did also notice you were using a different image than in the top post.

Uh, so my suspicion has been confirmed. It is an issue with BusyBox wget, so we have to switch to GNU wget in the storagenode-base image.

I think you’d also have to encapsulate the url in quotes. It seems to push it to the background because of the & in there.

$ docker run -i -t --name testnetwork --rm --entrypoint sh storjlabs/storagenode:35efb6462-go1.17.5
/app # wget -O- "version.storj.io/processes/storagenode/minimum/url?os=linux&arch=amd64"
Connecting to version.storj.io (35.224.88.204:80)
Connecting to version.storj.io (35.224.88.204:443)
wget: error getting response: Connection reset by peer
/app # wget -O- "https://version.storj.io/processes/storagenode/minimum/url?os=linux&arch=amd64"
Connecting to version.storj.io (35.224.88.204:443)
wget: error getting response: Connection reset by peer
/app # apk add wget
fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/community/x86_64/APKINDEX.tar.gz
(1/3) Installing libunistring (0.9.10-r1)
(2/3) Installing libidn2 (2.3.2-r0)
(3/3) Installing wget (1.21.2-r2)
Executing busybox-1.34.1-r3.trigger
OK: 67 MiB in 38 packages
/app # wget -O- "https://version.storj.io/processes/storagenode/minimum/url?os=linux&arch=amd64"
--2022-03-17 22:49:47--  https://version.storj.io/processes/storagenode/minimum/url?os=linux&arch=amd64
Resolving version.storj.io (version.storj.io)... 35.224.88.204
Connecting to version.storj.io (version.storj.io)|35.224.88.204|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 84 [text/plain]
Saving to: 'STDOUT'

-                                                                        0%[                                                                                                                                                                             ]       0  --.-KB/s               h-                                                                      100%[============================================================================================================================================================================>]      84  --.-KB/s    in 0s

2022-03-17 22:49:47 (25.3 MB/s) - written to stdout [84/84]

/app #

Looks like the entrypoint already does that though.

1 Like

Yes that’s true, but it still fails with BusyBox wget

1 Like

@clement

I get the same error message with VERSION_SERVER_URL="https://version.qa.storj.io" but it works fine with VERSION_SERVER_URL=https://version.qa.storj.io

But at the end the process just hangs:

testnode_1   | 2022-03-17T22:57:34.734Z INFO    Running on version      {"Service": "storagenode-updater", "Version": "v1.50.3"}
testnode_1   | 2022-03-17T22:57:34.734Z INFO    Downloading versions.   {"Server Address": "https://version.qa.storj.io"}
testnode_1   | 2022-03-17T22:57:35.212Z INFO    Current binary version  {"Service": "storagenode", "Version": "v1.48.2"}
testnode_1   | 2022-03-17T22:57:35.212Z INFO    Download started.       {"From": "https://github.com/storj/storj/releases/download/v1.50.3/storagenode_linux_amd64.zip", "To": "/tmp/storagenode_linux_amd64.1411460718.zip"}
testnode_1   | 2022-03-17T22:57:38.131Z INFO    Download finished.      {"From": "https://github.com/storj/storj/releases/download/v1.50.3/storagenode_linux_amd64.zip", "To": "/tmp/storagenode_linux_amd64.1411460718.zip"}
testnode_1   | 2022-03-17 22:57:38,131 INFO waiting for processes, storagenode, storagenode-updater to die
testnode_1   | 2022-03-17T22:57:38.155Z INFO    Restarting service.     {"Service": "storagenode"}
testnode_1   | 2022-03-17T22:57:38.540Z ERROR   Error updating service. {"Service": "storagenode", "error": "error stopping storagenode service: strconv.Atoi: parsing \"unix:///run/supervisord.sock no such file\": invalid syntax", "errorVerbose": "error stopping storagenode service: strconv.Atoi: parsing \"unix:///run/supervisord.sock no such file\": invalid syntax\n\tmain.restartService:37\n\tmain.update:68\n\tmain.loopFunc:27\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tmain.cmdRun:128\n\tstorj.io/private/process.cleanup.func1.4:363\n\tstorj.io/private/process.cleanup.func1:381\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.main:12\n\truntime.main:255"}
testnode_1   | 2022-03-17T22:57:38.552Z INFO    Current binary version  {"Service": "storagenode-updater", "Version": "v1.50.3"}
testnode_1   | 2022-03-17T22:57:38.553Z INFO    Version is up to date   {"Service": "storagenode-updater"}
testnode_1   | 2022-03-17T22:57:38.771Z ERROR   contact:service contact/service.go:103  ping satellite failed   {"Satellite ID": "12ZQbQ8WWFEfKNE9dP78B1frhJ8PmyYmr8occLEf1mQ1ovgVWy", "attempts": 4, "error": "ping satellite: check-in ratelimit: node rate limited by id", "errorVerbose": "ping satellite: check-in ratelimit: node rate limited by id\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:136\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:98\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
testnode_1   | storj.io/storj/storagenode/contact.(*Service).pingSatellite
testnode_1   |  /go/src/storj.io/storj/storagenode/contact/service.go:103
testnode_1   | storj.io/storj/storagenode/contact.(*Chore).updateCycles.func1
testnode_1   |  /go/src/storj.io/storj/storagenode/contact/chore.go:87
testnode_1   | storj.io/common/sync2.(*Cycle).Run
testnode_1   |  /go/pkg/mod/storj.io/common@v0.0.0-20220131120956-e74f624a3d55/sync2/cycle.go:92
testnode_1   | storj.io/common/sync2.(*Cycle).Start.func1
testnode_1   |  /go/pkg/mod/storj.io/common@v0.0.0-20220131120956-e74f624a3d55/sync2/cycle.go:71
testnode_1   | golang.org/x/sync/errgroup.(*Group).Go.func1
testnode_1   |  /go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
testnode_1   | 2022-03-17 22:57:41,773 INFO waiting for processes, storagenode, storagenode-updater to die
testnode_1   | 2022-03-17 22:57:44,776 WARN killing 'storagenode-updater' (104) with SIGKILL
testnode_1   | 2022-03-17 22:57:44,777 INFO waiting for processes, storagenode, storagenode-updater to die
testnode_1   | 2022-03-17 22:57:44,780 INFO stopped: storagenode-updater (terminated by SIGKILL)
testnode_1   | 2022-03-17T22:57:44.780Z INFO    process/exec_conf.go:116        Got a signal from the OS: "terminated"
testnode_1   | 2022-03-17T22:57:44.781Z INFO    contact:service contact/service.go:107  context cancelled       {"Satellite ID": "12ZQbQ8WWFEfKNE9dP78B1frhJ8PmyYmr8occLEf1mQ1ovgVWy"}
testnode_1   | 2022-03-17 22:57:44,805 INFO stopped: storagenode (exit status 0)
testnode_1   | 2022-03-17 22:57:44,806 INFO stopped: processes (terminated by SIGTERM)

Now the docker container is still running and so docker doesn’t restart. Not a great ending.

Neither of those work on my end unfortunately.

I’m a little confused, if we’re all using the same docker image, we should all be using the same wget version. Why is it just me getting this error?