All docker nodes (except one) crashing because of updater

570RJ · March 25, 2022, 1:49pm

All my storagenodes are crashing because of updater. I have 8 storagenodes on my server - 1 for each hard drive. After updating to latest docker image (30 min ago), 7 out of 8 docker containers are crashing with following message:

Error: Another program is already listening on a port that one of our HTTP servers is configured to use.  Shut this program down first before starting supervisord.

Seems like you cannot use more than one docker instance on a server. Is there a way to customize the port of supervisord?

P.S. I am using --net=host in my docker run.

littleskunk · March 25, 2022, 2:56pm

github.com/storj/storj

[storagenode] The multiple nodes with `--network host` after upgrade to 1.50.4 cannot work anymore

opened 08:01AM - 25 Mar 22 UTC

AlexeyALeonov

Bug

***This ticket was escalated from Zendesk by Aleksey Leonov:*** **Ticket:** […#15983](https://storj.zendesk.com/agent/tickets/15983) **Type:** Ticket **Priority:** - **Requester:** Aleksandr **Organization:** - **Assignee:** Aleksey Leonov ***  **Description** If you run multiple nodes on the same host with `--network host`, then only one node can work.  **Steps to reproduce the issue:** 1. Run the first one node with `--network host` 2. Try to run a second one - it will fail ``` 2022-03-25 06:02:33,005 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message. Error: Another program is already listening on a port that one of our HTTP servers is configured to use. Shut this program down first before starting supervisord. For help, use /usr/bin/supervisord -h ``` **Describe the results you expected:** Multiple nodes with `--network host` can work as before **Describe the results you received:** You can run only one node **Possible Fix** Provide an ability to configure supervisord to listen on different ports? **Additional information you deem important (e.g. issue happens only occasionally):** **Logs:** ``` 2022-03-25 06:02:33,005 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message. Error: Another program is already listening on a port that one of our HTTP servers is configured to use. Shut this program down first before starting supervisord. For help, use /usr/bin/supervisord -h ``` **Your environment** - Operating system and version: Debian 10 - Additional environment details (Raspberry PI, Docker, VMWare, etc.): docker Attachments:<br/> [image001.png](https://storj.zendesk.com/attachments/token/lcUHu2CzM5X5kOPG59W7ij3ld/?name=image001.png)<br/> [image001.png](https://storj.zendesk.com/attachments/token/ITsCmlRNLKgwUDdiN6xL5hdNK/?name=image001.png) gz#15983

As a workaround, you could run the old docker image until this gets fixed, use docker with a bridge network, migrate the docker nodes to native Linux binary. Just a few ideas on how you can get your storage nodes back online.

SGC · March 25, 2022, 3:06pm

i got 2 nodes running on the same proxmox container with docker inside, which both are updated to storagenode v1.50.4

they work fine…

initially when i first installed the new version it exited because the updater wasn’t downloaded or something like that, but after a bit, it simply went away and haven’t shown itself since.

but that was when we was testing the new image, before it was released.

570RJ · March 25, 2022, 3:18pm

Thanks for the suggestions. I reverted to image with ID f1a22a725dd5 and disabled auto update of my docker nodes. Once the github issue is fixed, I will switch to latest tag again.

Alexey · March 25, 2022, 4:57pm

It’s happen only if you use --network host, because all containers are listening on the host network, and thus must have an unique port. The supervisor is started on fixed port. This is the essential of the bug. You need either listen on random free port, or allow to configure the listening port or use UNIX sockets (since there should not be external activity).

570RJ · April 13, 2022, 5:09am

Any update when this will be fixed? I see there have been two more releases since the issue was introduced but github issue is still open. Right now I cannot update any of my servers because of this.

Alexey · April 13, 2022, 5:35am

You can track this issue: [storagenode] The multiple nodes with `--network host` after upgrade to 1.50.4 cannot work anymore · Issue #4661 · storj/storj · GitHub

At the moment I can suggest to remove the --network host and use port mapping, this resolves an issue.
The other solution is to migrate to binaries, since you already have unique ports for all needed services. See

and

570RJ · April 21, 2022, 5:41pm

I have 25 nodes over 5 servers and unfortunately I cannot do either of them due to complexity and time involved. Changing docker to not use host networking requires me to manually map all the ports used.

It is already non-trivial to run the nodes and payout does not make it worth to keep spending time on maintaining server every time it breaks due to update/new feature. I dont understand why storj thinks it is ok to break existing deployments instead of being backwards compatible. I am running nodes because I had hosting automated with updates and it did not require time from me.

Currently I am just going to leave server as is - if storj decides to fix the issue before my server gets disqualified, then I will enable automatic docker updates to recent docker image. Otherwise, if nodes get disqualified, then that’s it for my SNO journey.

Alexey · April 21, 2022, 7:54pm

Yes, the non-standard configurations was not tested. In our documentation the docker nodes configured with port mapping.
I would kindly ask you to update your docker run commands with port mappings instead of using of --network host.
I’m running not so much nodes, but I used the docker-compose.yaml file, so any modifications and re-run is a piece of cake.

Alexey · April 22, 2022, 8:25am

4 posts were split to a new topic: Processes entered FATAL state, too many start retries too quickly

SGC · April 23, 2022, 8:02am

you could just drop the

–network host

parameter, i don’t really see the point in using that, i only use one nic on my server for the lan and it has like 15 different vm’s with each their own ip addresses, even some of the nodes share containers which is single ip… so they all run on different ports…

duno what the point is in that parameter aside from making docker stick to the host network…
i guess if you are on a corporate network and cannot expand the ip range you got… but then you could most likely just do a vlan or similar…

if the whole issue comes from the --network host parameter, it seems to me the easiest way to fix it would be to drop that command.

i just use the docker run command and notepad ++
notepad ++ makes it fairly easy to change and replace stuff.

sure there are configurations i would hate to have to deal with… like for one i’m pondering moving my nodes into a vm and do HBA passthrough to try and solve a problem i’ve been chasing for a couple of years now…

so i doubt it will even work… but i might attempt that, but this will mean yet another complete restructuring of my storj node setup, which will be like the 4th or 5th time…
the process is getting more and more streamlined every time…
port mapping and ip’s so it makes sense and is easy to read…

sorry that you have issues… but i think the main issue seems to be related to planning and custom solutions.
it’s a good thing to try to stick towards the defaults, because the more fringe stuff get the less tested it will be…