Node repeatly stop every few hrs after did a update yesterday

here is one of node’s log before stop

2022-04-07T12:12:45.880Z        INFO    piecestore      uploaded        {"Piece ID": "LI6WPF57OMMQXQN532PPTRGXMI3V6IDG5OMPF7W22XXF6BY5533Q", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Size": 4864}
2022-04-07T12:12:56.485Z        INFO    piecestore      upload started  {"Piece ID": "H2B2DW3EXLYZEW67ARP7KO2IY5AOCUMXHO7YMATML572QI3CXIYA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "Available Space": 7942257534980}
2022-04-07T12:12:56.751Z        INFO    piecestore      uploaded        {"Piece ID": "H2B2DW3EXLYZEW67ARP7KO2IY5AOCUMXHO7YMATML572QI3CXIYA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "Size": 13312}
2022-04-07T12:12:57.939Z        INFO    Downloading versions.   {"Server Address": "https://version.storj.io"}
2022-04-07T12:12:58.456Z        INFO    Current binary version  {"Service": "storagenode", "Version": "v1.52.2"}
2022-04-07T12:12:58.456Z        INFO    Version is up to date   {"Service": "storagenode"}
2022-04-07T12:12:58.466Z        INFO    Current binary version  {"Service": "storagenode-updater", "Version": "v1.50.4"}
2022-04-07T12:12:58.467Z        INFO    Download started.       {"From": "https://github.com/storj/storj/releases/download/v1.52.2/storagenode-updater_linux_amd64.zip", "To": "/tmp/storagenode-updater_linux_amd64.2034414796.zip"}
2022-04-07T12:12:59.643Z        INFO    Download finished.      {"From": "https://github.com/storj/storj/releases/download/v1.52.2/storagenode-updater_linux_amd64.zip", "To": "/tmp/storagenode-updater_linux_amd64.2034414796.zip"}
2022-04-07T12:12:59.654Z        INFO    Restarting service.     {"Service": "storagenode-updater"}
2022-04-07 12:12:59,657 INFO exited: storagenode-updater (exit status 1; not expected)
2022-04-07T12:13:00.184Z        INFO    bandwidth       Performing bandwidth usage rollups
2022-04-07 12:13:00,187 INFO spawned: 'storagenode-updater' with pid 345
2022-04-07 12:13:00,188 WARN received SIGQUIT indicating exit request
2022-04-07 12:13:00,188 INFO waiting for processes, storagenode, storagenode-updater to die
2022-04-07T12:13:00.201Z        INFO    Configuration loaded    {"Location": "/app/config/config.yaml"}
2022-04-07T12:13:00.201Z        INFO    Invalid configuration file key  {"Key": "contact.external-address"}
2022-04-07T12:13:00.201Z        INFO    Invalid configuration file key  {"Key": "operator.wallet-features"}
2022-04-07T12:13:00.201Z        INFO    Invalid configuration file key  {"Key": "server.private-address"}
2022-04-07T12:13:00.201Z        INFO    Invalid configuration file key  {"Key": "storage.allocated-disk-space"}
2022-04-07T12:13:00.201Z        INFO    Invalid configuration file key  {"Key": "server.address"}
2022-04-07T12:13:00.201Z        INFO    Invalid configuration file key  {"Key": "storage.allocated-bandwidth"}
2022-04-07T12:13:00.201Z        INFO    Invalid configuration file key  {"Key": "operator.email"}
2022-04-07T12:13:00.201Z        INFO    Invalid configuration file key  {"Key": "operator.wallet"}
2022-04-07T12:13:00.201Z        INFO    Invalid configuration file value for key        {"Key": "log.level"}
2022-04-07T12:13:00.202Z        INFO    Running on version      {"Service": "storagenode-updater", "Version": "v1.52.2"}
2022-04-07T12:13:00.203Z        INFO    Downloading versions.   {"Server Address": "https://version.storj.io"}
2022-04-07T12:13:00.741Z        INFO    Current binary version  {"Service": "storagenode", "Version": "v1.52.2"}
2022-04-07T12:13:00.741Z        INFO    Version is up to date   {"Service": "storagenode"}
2022-04-07T12:13:00.749Z        INFO    Current binary version  {"Service": "storagenode-updater", "Version": "v1.52.2"}
2022-04-07T12:13:00.749Z        INFO    Version is up to date   {"Service": "storagenode-updater"}
2022-04-07 12:13:03,752 INFO waiting for processes, storagenode, storagenode-updater to die
2022-04-07 12:13:06,756 INFO waiting for processes, storagenode, storagenode-updater to die
2022-04-07T12:13:08.506Z        INFO    piecestore      upload started  {"Piece ID": "27WIHJGC4W5MIJPKWTP2BHZXTNW3GINDR3IUOVZTL6NN6MFR6OFA", "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "Action": "PUT_REPAIR", "Available Space": 7942257521156}
2022-04-07T12:13:08.525Z        INFO    piecestore      uploaded        {"Piece ID": "27WIHJGC4W5MIJPKWTP2BHZXTNW3GINDR3IUOVZTL6NN6MFR6OFA", "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "Action": "PUT_REPAIR", "Size": 768}
2022-04-07 12:13:10,528 WARN killing 'storagenode-updater' (345) with SIGKILL
2022-04-07 12:13:10,528 INFO waiting for processes, storagenode, storagenode-updater to die
2022-04-07 12:13:10,530 INFO stopped: storagenode-updater (terminated by SIGKILL)
2022-04-07T12:13:10.530Z        INFO    Got a signal from the OS: "terminated"
2022-04-07T12:13:10.531Z        INFO    piecestore      upload canceled {"Piece ID": "Y6AGVPAMHHIHFLISYKBO6I7H3HDL573BC4JIRC5MWLU5BQBFM3PQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Size": 1572864}
2022-04-07 12:13:10,611 INFO stopped: storagenode (exit status 0)
2022-04-07 12:13:10,611 INFO stopped: processes (terminated by SIGTERM)

Haven’t update for like 2 weeks,than I noticed there’s no ingress traffic yesterday.
So I just updated all my nodes.

there’s no problem during updates and Running normally, until few hrs later all nodes stops.

Starting all nodes up ,but there’s happens another stops on all nodes, this time service last for about 4-5 hrs.

So here is one of my nodes log ,take a look for me, tks.

all my nodes running on unraid docker

This seems to be caused by a storagenode update followed by a storagenode-updater update a few hours later. Please make sure you use the --restart unless-stopped option in your docker run command to ensure the container restarts automatically.

I encountered the same situation. Unfortunately, it is no solution for me to set the container restart policy to unless-stopped.
How can I disable the auto update service inside the container or how can I schedule the auto update service?

I have the same problem with unraid and 5 storj dockers… Auto start is on.

Same here. Unraid, with 3 storj nodes. Added the restart policy for the time being, and hopefully that works :crossed_fingers:

This option

in the docker run command is doing the same as your script.

1 Like

No way, at least with a docker version.
Why can’t you add --restart unless-stopped to your docker run command?

Hello,

I don’t know if my problem is linked but I no longer have ingress trafic since 6th of April :

But still egress :

I’ve been running Storj node for almost 1 year without any problem. I tried to restart the container but nothing changed.

EDIT : Here is my watchtower log :

image

So it seems to have updated today at 10:58 but image is from 08/02/22

EDIT 2 : Crap ! I think I found, I renamed my storagenode container but forgot to change my watchtower arguments… For watchtower, my command line is :

docker run -d --restart=unless-stopped --name STORJ-watchtower -v /var/run/docker.sock:/var/run/docker.sock storjlabs/watchtower:latest STORJ-storagenode STORJ-watchtower --cleanup --interval 3600 --stop-timeout 300s

Can you confirm that “STORJ-storagenode” have to be my storage node docker container’s name ? and “STORJ-watchtower” have to be my watchtower docker container’s name ?

@tigerblue77
I think you need to update the node, we are at version 1.52.2

3 Likes

Yes, you are correct. The parts I have made bold below tell watchtower what containers to monitor for updates.

1 Like

Because Storj is running alongside other services on the host and the startup is too heavy on the hardware to happen uncontrolled.
So I must find a way to either schedule the updates or disable/enable them on demand.

What side effects would it have, if I block: https://version.storj.io

1 Like

Same Problem, since last update my docker keeps stopping at unraid. If I loose my node again after now 14 months I am out of that game…

Clearly a software problem and this needs to be fixxed. It run fine for 14 months with 100% uptime and now I get an update that kills my node?

Please fix it asap!

It’s better to start your nodes with some time interval, then they would follow this interval in the future.
If you block the versions server the node could not start.

However, I do not understand how is it related to --restart unless-stopped in your docker run command?
It also should not be a problem, if your nodes running each on own disk. If they all share the same disk - this is against Node Operator Terms & Conditions and definitely creates a high load on your disk.

Yes, it’s need to be fixed on unraid to have option --restart unless-stopped in your docker run command.
It’s always better to use the docker CLI directly instead of outdated application, which is wrapper around the docker CLI anyway.

With --restart unless-stopped the container restarts at a random time.
A 9TB node takes approximately 15 minutes to fully start up.
During that time, other services (not other storj nodes) are impaired.

Furthermore --restart unless-stopped causes tremendous trouble, if the docker daemon (re-)starts, because storj needs to start last on my system, to not impair the startup of other containers.

In the past I also learned that --restart unless-stopped can cause the container to get stuck in a restart loop, if I make a mistake. Since it takes a while, to get a node going, I personally prefer to have the container exit and stay offline for me to inspect, in case of any complications.

Why is it necessary to run an updater, in a temporary container, alongside watchtower anyways?
The moment I recreate the container, I am running on the old version(s) again. Then the updater will update and restart the container a few hours later again.

1 Like

We have a rolling update procedure to do not shutdown the whole network in case of a bug.
The only remained storagenode version not covered by this policy was a docker version of storagenode. The patched watchtower to take a random interval of checking between 12 and 72 hours is not ideal - because we should wait at least 5 days before push the docker image.

Now this image is a self-updating service, the storagenode-updater inside it is following the standard rollout policy as a normal binary nodes (Windows/Linux GUI and plain binaries, include macOS and FreeBSD).

This option (--restart unless-stopped) allows you to stop the container (and maybe even remove the container after the stop) before reboot or restart the docker daemon and start it (or run it) in the right time. It will keep restarting in case of the crash of the container, it could help to improve uptime, in this case it helps to update the service, if something goes wrong.
What I meant - if you stop the container which was run with this option, it will not be restarted automatically after reboot. You will need to start it manually later. But when it’s running - it will restart the container if it would crash.

Thanks for your help !

I restarted my watchtower 10h ago (with 1h interval) but it didn’t update my storagenode.
I think it’s because of the “–label=com.centurylinklabs.watchtower.enable=false” argument that I set at storagenode start… Can you confirm ?

I set this argument because I have a containrrr/watchtower running for other containers and I didn’t want it to update my storagenode, as it is STORJ-watchtower’s role… In fact I didn’t want the two watchtowers to come into conflict but obviously I didn’t do it the right way. Maybe I only need one watchtower? If so, which one? Is the STORJ-watchtower a perfect clone of the containrrr/watchtower? I didn’t find any detailed documentation on this…

Thank you! I updated my storagenode manually and I’m getting files again :slight_smile:

Not really, just when updates happen. Which I agree is a problem. But at the moment the choice on your end is for it to restart or to stop. I think restart is the better option between those two. But I do feel like the container shouldn’t stop at all while updating. That would be something for Storj to fix. All you can do in the mean time is pick the best of 2 not so ideal choices.

I had the same fear initially, but this is actually not the case. The image doesn’t come with the binaries anymore. Instead on start it downloads the latest updater, then checks using the updater which version your node should be on according to the rollout and downloads the binary. This prevents downgrades.

No, but it’s very close. As @Alexey mentioned already it has been modified to check at a random interval between 12 and 72 hours. That was more relevant when it was used to actually update the storagenode itself. If your setup would be simpler if you could just use one, I don’t think there is any problem in doing that. But as with any non-standard part of any setup, getting support on it may be harder. You’d be on your own. Which if you already use watchtower for other stuff, I’m guessing isn’t really an issue for you.

2 Likes

I added that line in the extra parameters. What I dont understand: It never happend before the update. For me it seems like the update changed something that makes the docker crash. It worked for months without stopping - until last update.