Server frozen, nodes trying to update

Today I woke up and my server was kind of frozen. I checked the logs and saw that my nodes were all offline and doing this all the time:

2023-10-17T04:41:06Z    INFO    Current binary version  {"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.88.3"}
2023-10-17T04:41:06Z    INFO    New version is being rolled out but hasn't made it to this node yet     {"Process": "storagenode-updater", "Service": "storagenode-updater"}
2023-10-17T04:56:05Z    INFO    Downloading versions.   {"Process": "storagenode-updater", "Server Address": "https://version.storj.io"}
2023-10-17T04:56:07Z    INFO    Current binary version  {"Process": "storagenode-updater", "Service": "storagenode", "Version": "v1.88.3"}
2023-10-17T04:56:07Z    INFO    New version is being rolled out but hasn't made it to this node yet     {"Process": "storagenode-updater", "Service": "storagenode"}
2023-10-17T04:56:07Z    INFO    Current binary version  {"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.88.3"}
2023-10-17T04:56:07Z    INFO    New version is being rolled out but hasn't made it to this node yet     {"Process": "storagenode-updater", "Service": "storagenode-updater"}
2023-10-17T05:11:05Z    INFO    Downloading versions.   {"Process": "storagenode-updater", "Server Address": "https://version.storj.io"}
2023-10-17T05:11:06Z    INFO    Current binary version  {"Process": "storagenode-updater", "Service": "storagenode", "Version": "v1.88.3"}
2023-10-17T05:11:06Z    INFO    New version is being rolled out but hasn't made it to this node yet     {"Process": "storagenode-updater", "Service": "storagenode"}
2023-10-17T05:11:06Z    INFO    Current binary version  {"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.88.3"}
2023-10-17T05:11:06Z    INFO    New version is being rolled out but hasn't made it to this node yet     {"Process": "storagenode-updater", "Service": "storagenode-updater"}
2023-10-17T05:26:05Z    INFO    Downloading versions.   {"Process": "storagenode-updater", "Server Address": "https://version.storj.io"}
2023-10-17T05:26:06Z    INFO    Current binary version  {"Process": "storagenode-updater", "Service": "storagenode", "Version": "v1.88.3"}
2023-10-17T05:26:06Z    INFO    New version is being rolled out but hasn't made it to this node yet     {"Process": "storagenode-updater", "Service": "storagenode"}
2023-10-17T05:26:06Z    INFO    Current binary version  {"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.88.3"}
2023-10-17T05:26:06Z    INFO    New version is being rolled out but hasn't made it to this node yet     {"Process": "storagenode-updater", "Service": "storagenode-updater"}

Because all nodes were doing this, I think my server got overloaded:

Strangely, I am not able to stop the docker containers. I am also not able to shutdown the whole system. It just doesn’t do anything. Even kill -9 PID didn’t work.

If you redirected logs to the file, then docker logs will show logs only from storagenode-updater, but not storagenode.
You need to check the node’s logs.

I didn’t redirect, this was the Storagenode logs. It seems like the node wasn’t doing anything else.

This usually suggest a hardware issue. Perhaps one of the disks misbehave. Please check them all and their S.M.A.R.T.

I would suggest to try to reboot the system and monitor it for a while. if the sudo reboot now command doesn’t work, then, well, it’s definitely a hardware issue maybe even worse than only problems with disk.
It’s also worth to check dmesg or journalctl regarding hardware issues, include disks.

1 Like

I had this problem maybe a year ago. i cut the power to reboot my server. i wasnt able to find any clues in the log files.
i thought it was something with the updater but i couldnt find any errors.

after the hard reboot it hasnt happend again.

1 Like

Had something strange from 1.88.x to 1.89.5 right now. i watched the dashboard, and the time did not go to zero if i reload it. 59 min ago always then: dashboard not loadable.

storj process was marked as shutting down under the widows services tab.

uptimerobot got not triggered???(edit , maybe time to short because i noticed the silent drive first)

no fatal log entries
updater restarted, nothing. right click on process- no choices to restart stop or else.
reboot- all normal again.

maybe the “could not stop process error” i heard about here in the forum?

Yes, the described behavior is related to issues with a hardware. I would recommend to check your disks for errors and their S.M.A.R.T., it’s also could be a problem with RAM - you need to test it too, sometimes it could be result of low or high power supply (PSU want to die), or just dried thermal paste under CPU.

tested both, no problems.

this causes a reset on the motherboard, who monitors power supply. (not freezing only one service)

its half a year old, AIO with decent temps under full load.

i think i can rule all out.

Then perhaps too fresh Windows update :slight_smile:
For example my PC cannot accept a new Windows version until it would be patched at least two times since release.

However, inability to stop the process is concerning and usually related to the hardware rather to a software.

no, that was a different issue - the service did not respond on stop requests and node continued to work as usual - serving the traffic, responding on audits and uptime checks, etc.