All docker nodes (except one) crashing because of updater

All my storagenodes are crashing because of updater. I have 8 storagenodes on my server - 1 for each hard drive. After updating to latest docker image (30 min ago), 7 out of 8 docker containers are crashing with following message:

Error: Another program is already listening on a port that one of our HTTP servers is configured to use.  Shut this program down first before starting supervisord.

Seems like you cannot use more than one docker instance on a server. Is there a way to customize the port of supervisord?

P.S. I am using --net=host in my docker run.

As a workaround, you could run the old docker image until this gets fixed, use docker with a bridge network, migrate the docker nodes to native Linux binary. Just a few ideas on how you can get your storage nodes back online.

i got 2 nodes running on the same proxmox container with docker inside, which both are updated to storagenode v1.50.4

they work fine…

initially when i first installed the new version it exited because the updater wasn’t downloaded or something like that, but after a bit, it simply went away and haven’t shown itself since.

but that was when we was testing the new image, before it was released.

Thanks for the suggestions. I reverted to image with ID f1a22a725dd5 and disabled auto update of my docker nodes. Once the github issue is fixed, I will switch to latest tag again.

It’s happen only if you use --network host, because all containers are listening on the host network, and thus must have an unique port. The supervisor is started on fixed port. This is the essential of the bug. You need either listen on random free port, or allow to configure the listening port or use UNIX sockets (since there should not be external activity).

1 Like

Any update when this will be fixed? I see there have been two more releases since the issue was introduced but github issue is still open. Right now I cannot update any of my servers because of this.

You can track this issue: [storagenode] The multiple nodes with `--network host` after upgrade to 1.50.4 cannot work anymore · Issue #4661 · storj/storj · GitHub

At the moment I can suggest to remove the --network host and use port mapping, this resolves an issue.
The other solution is to migrate to binaries, since you already have unique ports for all needed services. See

and

I have 25 nodes over 5 servers and unfortunately I cannot do either of them due to complexity and time involved. Changing docker to not use host networking requires me to manually map all the ports used.

It is already non-trivial to run the nodes and payout does not make it worth to keep spending time on maintaining server every time it breaks due to update/new feature. I dont understand why storj thinks it is ok to break existing deployments instead of being backwards compatible. I am running nodes because I had hosting automated with updates and it did not require time from me.

Currently I am just going to leave server as is - if storj decides to fix the issue before my server gets disqualified, then I will enable automatic docker updates to recent docker image. Otherwise, if nodes get disqualified, then that’s it for my SNO journey.

Yes, the non-standard configurations was not tested. In our documentation the docker nodes configured with port mapping.
I would kindly ask you to update your docker run commands with port mappings instead of using of --network host.
I’m running not so much nodes, but I used the docker-compose.yaml file, so any modifications and re-run is a piece of cake.

4 posts were split to a new topic: Processes entered FATAL state, too many start retries too quickly

you could just drop the

–network host

parameter, i don’t really see the point in using that, i only use one nic on my server for the lan and it has like 15 different vm’s with each their own ip addresses, even some of the nodes share containers which is single ip… so they all run on different ports…

duno what the point is in that parameter aside from making docker stick to the host network…
i guess if you are on a corporate network and cannot expand the ip range you got… but then you could most likely just do a vlan or similar…

if the whole issue comes from the --network host parameter, it seems to me the easiest way to fix it would be to drop that command.

i just use the docker run command and notepad ++
notepad ++ makes it fairly easy to change and replace stuff.

sure there are configurations i would hate to have to deal with… like for one i’m pondering moving my nodes into a vm and do HBA passthrough to try and solve a problem i’ve been chasing for a couple of years now…

so i doubt it will even work… but i might attempt that, but this will mean yet another complete restructuring of my storj node setup, which will be like the 4th or 5th time…
the process is getting more and more streamlined every time…
port mapping and ip’s so it makes sense and is easy to read…

sorry that you have issues… but i think the main issue seems to be related to planning and custom solutions.
it’s a good thing to try to stick towards the defaults, because the more fringe stuff get the less tested it will be…