Your node has gone offline

So this is the second time my node has gone offline since… I don’t know exactly, but since around version 1.74-1.76. This was never happening before and my node is about 5 months old.
(The first one happened on May 28th, at around the same time as this one. Very curious that both of those was on a sunday dawn EU time.)

I wake up, and see emails from the satellites at very different times that my node has gone offline.
I’m checking the logs of the storagenode and seeing a ton of “database is locked” logs.
And later a lot of “order: grace period passed for order limit”, “ping satellite: failed to ping storage node, your node indicated error code: 4, rpc: tcp connector failed: rpc: context deadline exceeded” “order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup eu1.storj.io: operation was canceled”

But what is more interesting is that in a couple of versions back when something like this happened, the storagenode binary just killed itself, and systemd restarted it (since I’m using systemd for running storagenode and the updater)., since the lines:

Restart      = always
RestartSec   = 5

are in the .service file.

In this case when I manually try to stop the storagenode.service I get this many logs:

Jun 11 09:16:23 localhost systemd[1]: storagenode.service: State 'stop-sigterm' timed out. Killing.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578496 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578497 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578499 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578500 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578502 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578503 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578504 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578566 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578568 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578569 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578570 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578571 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578572 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578573 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578574 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578584 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578585 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578586 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578587 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578588 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578589 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578590 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578591 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578592 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578594 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578595 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578596 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578605 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578620 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578621 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3578622 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3962896 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3962897 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3969650 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3977872 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3983766 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3983767 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3983808 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3985523 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3985525 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3985526 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3985709 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986501 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986502 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986503 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986504 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986505 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986506 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986508 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986509 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986510 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986511 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986530 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986531 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986532 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986533 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986530 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986531 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986532 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986533 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986534 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986535 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986536 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986537 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986538 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986540 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986541 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986542 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986583 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986584 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986585 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986586 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986587 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986588 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3986589 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3989213 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3989576 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3989577 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3989597 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3989620 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3989621 (storagenode) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3989622 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Killing process 3989623 (n/a) with signal SIGKILL.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Main process exited, code=killed, status=9/KILL
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Failed with result 'timeout'.
Jun 11 09:16:23 localhost systemd[1]: storagenode.service: Consumed 59min 25.053s CPU time.

Which means that the process was stuck! Not respoding to signals.

What is happening with your binary? Would you please fix it?

Hello @raert,
Welcome to the forum!

I’m afraid something happened with your hardware. When the process cannot be stopped it’s usually related to the hardware issues, such as disk unresponsiveness and/or problems with RAM/CPU.

I would like to suggest to check your disk for errors, then do memcheck, it also worth to check a temperature on the CPU when this happened.
Please also show what was in the storagenode’s log when this happened (20 lines should be enough).
What’s version of storagenode?

It could be possible, but so far only your node behave like this (from 20k).
Perhaps also this one on raspberry Pi:

Please provide a version of storagenode.

Please post 20 lines from your logs on time when it’s happened. I guess firstly it goes offline for some reason and there should be errors.

Looks like it lose a network at some point, or there was a different error before all “upload failed” errors.
It doesn’t look like a normal behavior for the typical Linux server.
Do you have any other network services working here? Could it be result of exhausted TCP file descriptors or ports?

You may submit your idea there:

or on our GitHub Issues · storj/storj · GitHub

Thank you!
I running two nodes with docker, there no such errors.
But perhaps we need more reports from SNOs who run it as a service.

When I receive the email informing me that my node has gone offline and I connect to the PC with Remote Desktop, I can see that my network is on “No Internet access”. When I work on this PC in Remote Desktop it reconnects often. To keep my nodes online, I made a Scheduled Task to restart the PC once a day, is this the right solution?

Unlikely. You need to fix an issue with not resolving your external address. “No internet” usually mean that the current DNS server doesn’t work, in the same time there could be routing issues too or your ISP did something.
You may try to configure a DNS server on your host (or in the DHCP section on your router) to use 8.8.8.8 instead of your current DNS server (likely the ISP’s one, which stopped to work).

Had an slighltly similar error, it was the driver from the network card, wich appearantly has to be updated from the mainboard manufacturer (gigabyte) with upcomming of newer windows10/11 releases.

windows dit it NOT automatic. have had 2 pcs with this error once a month (behind one repeater via LAN).
luckily i fixed this before i started my second node.