Container not responsive

jammerdan · May 24, 2023, 4:07am

Sometimes when a container has been reaped it will no longer be responsive although it should restart automatically. Logs do not show anything it is just halted.

Then I try to stop and remove the container manually. Logs show

The logs show:

2023-05-24 02:35:03,431 WARN received SIGTERM indicating exit request
2023-05-24 02:35:03,445 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2023-05-24T02:35:03.577Z        INFO    Got a signal from the OS: "terminated"{"Process": "storagenode-updater"}
2023-05-24 02:35:03,650 INFO stopped: storagenode-updater (exit status 0)
2023-05-24 02:35:06,654 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-05-24 02:35:09,658 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-05-24 02:35:12,662 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-05-24 02:35:13,663 WARN killing 'storagenode' (57) with SIGKILL
2023-05-24 02:35:15,666 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-05-24 02:35:18,670 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-05-24 02:35:21,674 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-05-24 02:35:23,676 WARN killing 'storagenode' (57) with SIGKILL
2023-05-24 02:35:24,678 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-05-24 02:35:27,682 INFO waiting for storagenode, processes-exit-eventlistener to die

Docker outputs:

Error response from daemon: cannot stop container: storagenode: tried to kill container, but did not receive an exit event

The container remains present. It seems that the node software is waiting for something and Docker is waiting for something and none of it happens resulting in the container that cannot be stopped or removed.

Alexey · May 24, 2023, 4:25am

Usually inability to stop the container is related to hardware.

What’s storagenode version?

jammerdan · May 24, 2023, 2:02pm

This particular one was v1.76.2.

Alexey · May 25, 2023, 2:22am

do you have errors in the output of journalctl or dmesg when you stop the container?
I suspect a disks subsystem and/or high RAM usage.

jammerdan · May 25, 2023, 5:13am

"Container failed to exit within 5m0s of signal 15 - using the force"
"Container failed to exit within 10s of kill - trying direct SIGKILL"  error="context deadline exceeded"
"error killing container: context deadline exceeded" error="tried to kill container, but did not receive an exit event"
"Handler for POST /v1.42/containers/storagenode/stop returned error: cannot stop container: storagenode: tried to kill container, but did not receive an exit event"

The container had been reaped by OOM killer because of RAM limit in the run command.

A process of this unit has been killed by the OOM killer.

Normally it should restart then but it looks like this did not happen. It seems that reaping did not fully remove the container so that Docker believes it is still there running while in reality it is already shut down. So it keeps waiting and waiting and waiting.
So maybe next time despite any underlying issue I’ll try to remove the container files manually and see if it will restart then. It seems that Docker does not have a command to do that by itself.

Alexey · May 25, 2023, 7:56am

You may also try to restart a docker daemon.