Processes entered FATAL state, too many start retries too quickly

docker stop
docker rm
docker pull
pulling…

docker run
docker logs …


2022-04-22 05:18:11,206 INFO spawned: 'processes' with pid 53
2022-04-22 05:18:11,206 INFO success: storagenode entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2022-04-22 05:18:11,206 INFO success: storagenode-updater entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2022-04-22 05:18:11,213 INFO exited: processes (exit status 0; not expected)
2022-04-22 05:18:13,217 INFO spawned: 'processes' with pid 61
2022-04-22 05:18:13,224 INFO exited: processes (exit status 0; not expected)
2022-04-22 05:18:16,230 INFO spawned: 'processes' with pid 63
2022-04-22 05:18:16,237 INFO exited: processes (exit status 0; not expected)
2022-04-22 05:18:16,237 INFO gave up: processes entered FATAL state, too many start retries too quickly

WTF?

I got similar logs, but it seems to work also with these strange errors

Do you have many docker nodes and using --network host? If so - please replace --network host to port mapping.

now this


2022-04-22 05:39:44,279 INFO gave up: processes entered FATAL state, too many start retries too quickly
2022-04-22T05:54:37.571Z        INFO    Downloading versions.   {"Server Address": "https://version.storj.io"}
2022-04-22T05:54:38.226Z        INFO    Current binary version  {"Service": "storagenode", "Version": "v1.52.2"}
2022-04-22T05:54:38.226Z        INFO    New version is being rolled out but hasn't made it to this node yet     {"Service": "storagenode"}
2022-04-22T05:54:38.262Z        INFO    Current binary version  {"Service": "storagenode-updater", "Version": "v1.52.2"}
2022-04-22T05:54:38.263Z        INFO    New version is being rolled out but hasn't made it to this node yet     {"Service": "storagenode-updater"}

1 docker container

please replace --network host to port mapping.

this?
docker run -d --restart unless-stopped -p 28967:28967/tcp -p 28967:28967/udp

This is normal self-updating process. Now the docker nodes uses the same updating mechanism as binary ones. We gradually rolling out a new version to do not shutdown the whole network at once.

Yes, if you used the port mapping, then your issue is not related to a previous thread, so moved to a separate one (the current).

I just switched my node that I had been running on the bfc3d4c9d-v1.52.2-go1.17.5 tag to the latest tag and see the same “FATAL state” error. However the node appears to run correctly. No --network host used here. It seems that a process called ‘processes’ is created but can’t stay up so enters the fatal state. Unfortunately this is misleading when looking at the logs.

2022-04-22 11:12:15,154 INFO RPC interface 'supervisor' initialized
2022-04-22 11:12:15,155 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2022-04-22 11:12:15,156 INFO supervisord started with pid 1
2022-04-22 11:12:16,161 INFO spawned: 'processes' with pid 40
2022-04-22 11:12:16,169 INFO spawned: 'storagenode' with pid 41
2022-04-22 11:12:16,176 INFO spawned: 'storagenode-updater' with pid 42
2022-04-22 11:12:16,211 INFO exited: processes (exit status 0; not expected)
2022-04-22T11:12:16.261Z        INFO    Configuration loaded    {"Location": "/app/config/config.yaml"}
2022-04-22T11:12:16.261Z        INFO    Invalid configuration file key  {"Key": "storage.allocated-bandwidth"}
2022-04-22T11:12:16.261Z        INFO    Invalid configuration file key  {"Key": "contact.external-address"}
2022-04-22T11:12:16.261Z        INFO    Invalid configuration file key  {"Key": "storage.allocated-disk-space"}
2022-04-22T11:12:16.261Z        INFO    Invalid configuration file key  {"Key": "operator.email"}
2022-04-22T11:12:16.261Z        INFO    Invalid configuration file key  {"Key": "server.private-address"}
2022-04-22T11:12:16.261Z        INFO    Invalid configuration file key  {"Key": "operator.wallet"}
2022-04-22T11:12:16.261Z        INFO    Invalid configuration file key  {"Key": "server.address"}
2022-04-22T11:12:16.261Z        INFO    Invalid configuration file value for key        {"Key": "log.caller"}
2022-04-22T11:12:16.261Z        INFO    Invalid configuration file value for key        {"Key": "log.development"}
2022-04-22T11:12:16.261Z        INFO    Invalid configuration file value for key        {"Key": "log.level"}
2022-04-22T11:12:16.261Z        INFO    Invalid configuration file value for key        {"Key": "log.stack"}
2022-04-22T11:12:16.297Z        INFO    Running on version      {"Service": "storagenode-updater", "Version": "v1.52.2"}
2022-04-22T11:12:16.298Z        INFO    Downloading versions.   {"Server Address": "https://version.storj.io"}
2022-04-22T11:12:16.756Z        INFO    Current binary version  {"Service": "storagenode", "Version": "v1.52.2"}
2022-04-22T11:12:16.757Z        INFO    New version is being rolled out but hasn't made it to this node yet     {"Service": "storagenode"}
2022-04-22T11:12:16.793Z        INFO    Current binary version  {"Service": "storagenode-updater", "Version": "v1.52.2"}
2022-04-22T11:12:16.793Z        INFO    New version is being rolled out but hasn't made it to this node yet     {"Service": "storagenode-updater"}
2022-04-22 11:12:17,798 INFO spawned: 'processes' with pid 77
2022-04-22 11:12:17,800 INFO success: storagenode entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2022-04-22 11:12:17,800 INFO success: storagenode-updater entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2022-04-22 11:12:17,829 INFO exited: processes (exit status 0; not expected)
2022-04-22 11:12:19,837 INFO spawned: 'processes' with pid 83
2022-04-22 11:12:19,871 INFO exited: processes (exit status 0; not expected)
2022-04-22 11:12:22,882 INFO spawned: 'processes' with pid 84
2022-04-22 11:12:22,906 INFO exited: processes (exit status 0; not expected)
2022-04-22 11:12:22,908 INFO gave up: processes entered FATAL state, too many start retries too quickly

We have an event listener named “processes” that is set to exit the docker container by running the /bin/stop-supervisor script if the processes (storagenode and updater) fail to start. Maybe we should change the name to “processes-event-listener”. storj/supervisord.conf at 847ddaaab09ddf415d09fd4a29e15b72c2fa2e41 · storj/storj · GitHub

It seems supervisord is spawning the “processes” eventlistener as a separate process, and for whatever reason, (my guess is) the shell script is failing in debian bash. I will open a new ticket to investigate that.

This, however, shouldn’t affect or stop the storagenode and storagenode-updater processes from running.

2 Likes

Opened an issue on GitHub and submitted a quick PR to fix it: [storagenode]: Processes entered FATAL state, too many start retries too quickly · Issue #4772 · storj/storj · GitHub

2 Likes
2022-06-02 17:07:34,444 INFO spawned: 'processes' with pid 90
2022-06-02 17:07:34,450 INFO exited: processes (exit status 0; not expected)
2022-06-02 17:07:34,451 INFO gave up: processes entered FATAL state, too many start retries too quickly

Same problem Debian 10
–network host is not used.

What’s the version of storagenode?