So this is the second time my node has gone offline since… I don’t know exactly, but since around version 1.74-1.76. This was never happening before and my node is about 5 months old.
(The first one happened on May 28th, at around the same time as this one. Very curious that both of those was on a sunday dawn EU time.)
I wake up, and see emails from the satellites at very different times that my node has gone offline.
I’m checking the logs of the storagenode and seeing a ton of “database is locked” logs.
And later a lot of “order: grace period passed for order limit”, “ping satellite: failed to ping storage node, your node indicated error code: 4, rpc: tcp connector failed: rpc: context deadline exceeded” “order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup eu1.storj.io: operation was canceled”
But what is more interesting is that in a couple of versions back when something like this happened, the storagenode binary just killed itself, and systemd restarted it (since I’m using systemd for running storagenode and the updater)., since the lines:
Restart = always
RestartSec = 5
are in the .service file.
In this case when I manually try to stop the storagenode.service I get this many logs:
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: State 'stop-sigterm' timed out. Killing.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578496 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578497 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578499 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578500 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578502 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578503 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578504 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578566 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578568 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578569 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578570 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578571 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578572 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578573 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578574 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578584 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578585 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578586 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578587 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578588 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578589 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578590 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578591 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578592 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578594 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578595 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578596 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578605 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578620 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578621 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578622 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3962896 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3962897 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3969650 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3977872 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3983766 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3983767 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3983808 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3985523 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3985525 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3985526 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3985709 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986501 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986502 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986503 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986504 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986505 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986506 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986508 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986509 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986510 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986511 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986530 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986531 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986532 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986533 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986530 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986531 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986532 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986533 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986534 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986535 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986536 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986537 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986538 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986540 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986541 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986542 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986583 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986584 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986585 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986586 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986587 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986588 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986589 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3989213 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3989576 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3989577 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3989597 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3989620 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3989621 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3989622 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3989623 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Main process exited, code=killed, status=9/KILL
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Failed with result 'timeout'.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Consumed 59min 25.053s CPU time.
Which means that the process was stuck! Not respoding to signals.
What is happening with your binary? Would you please fix it?