Yet another node offline after update thread

Two of my nodes got offline around version update.

Ho I read the logs:
storagenode-updater: lets do the update!

2023-05-19T08:48:54.207Z INFO Download started.

storagenode: interrupted

2023-05-19T08:49:05.271Z INFO Got a signal from the OS: “interrupt” {“Process”: “storagenode”}

storagenode-updater: restart service

2023-05-19T08:49:05.845Z INFO Restarting service. {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”}

storagenode-updater: some problem

2023-05-19 08:49:06,105 INFO exited: storagenode-updater (exit status 1; not expected)
2023-05-19 08:49:06,107 INFO spawned: ‘storagenode-updater’ with pid 3931

storagenode: started

2023-05-19T08:49:06.184Z INFO Configuration loaded

storagenode-updater: happy with the storagenode

2023-05-19 08:49:07,163 INFO success: storagenode entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

storagenode-updater: need to stop storagenode. Why? Because of problem at 2023-05-19 08:49:06,105?

2023-05-19 08:49:12,623 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2023-05-19 08:49:15,627 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2023-05-19 08:49:16,628 WARN killing ‘storagenode-updater’ (3931) with SIGKILL
2023-05-19 08:49:16,632 INFO stopped: storagenode-updater (terminated by SIGKILL)

storagenode: terminated and stopped

2023-05-19T08:49:16.632Z INFO Got a signal from the OS: “terminated” {“Process”: “storagenode”}

Now also init check is terminated, I think this is only a consequence

2023-05-19T08:49:16.666Z FATAL Unrecoverable error {“Process”: “storagenode”, “error”: “system clock is out of sync: system clock is out of sync with all trusted satellites”

Do you have any thoughts? I think, this happens to me pretty regularly.

Sorry if it is copy of another thread, I have tried to find anything recent and close to this, believe me :slight_smile:

Docker logs around the update

2023-05-19T08:33:54.193Z INFO New version is being rolled out but hasn’t made it to this node yet {“Process”: “storagenode-updater”, “Service”: “storagenode”}
2023-05-19T08:33:54.207Z INFO Current binary version {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”, “Version”: “v1.76.2”}
2023-05-19T08:33:54.207Z INFO New version is being rolled out but hasn’t made it to this node yet {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”}
2023-05-19T08:48:48.779Z INFO Downloading versions. {“Process”: “storagenode-updater”, “Server Address”: “https://version.storj.io”}
2023-05-19T08:48:54.206Z INFO Current binary version {“Process”: “storagenode-updater”, “Service”: “storagenode”, “Version”: “v1.76.2”}
2023-05-19T08:48:54.207Z INFO Download started. {“Process”: “storagenode-updater”, “From”: “https://github.com/storj/storj/releases/download/v1.78.3/storagenode_linux_amd64.zip”, “To”: “/tmp/storagenode_linux_amd64.1183188037.zip”}
2023-05-19T08:49:05.007Z INFO Download finished. {“Process”: “storagenode-updater”, “From”: “https://github.com/storj/storj/releases/download/v1.78.3/storagenode_linux_amd64.zip”, “To”: “/tmp/storagenode_linux_amd64.1183188037.zip”}
2023-05-19T08:49:05.029Z INFO Restarting service. {“Process”: “storagenode-updater”, “Service”: “storagenode”}
2023-05-19T08:49:05.271Z INFO Service restarted successfully. {“Process”: “storagenode-updater”, “Service”: “storagenode”}
2023-05-19T08:49:05.281Z INFO Current binary version {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”, “Version”: “v1.76.2”}
2023-05-19T08:49:05.281Z INFO Download started. {“Process”: “storagenode-updater”, “From”: “https://github.com/storj/storj/releases/download/v1.78.3/storagenode-updater_linux_amd64.zip”, “To”: “/tmp/storagenode-updater_linux_amd64.4176807791.zip”}
2023-05-19 08:49:05,591 INFO exited: storagenode (exit status 0; expected)
2023-05-19 08:49:06,104 INFO spawned: ‘storagenode’ with pid 3930
2023-05-19T08:49:05.834Z INFO Download finished. {“Process”: “storagenode-updater”, “From”: “https://github.com/storj/storj/releases/download/v1.78.3/storagenode-updater_linux_amd64.zip”, “To”: “/tmp/storagenode-updater_linux_amd64.4176807791.zip”}
2023-05-19T08:49:05.845Z INFO Restarting service. {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”}
2023-05-19 08:49:06,105 INFO exited: storagenode-updater (exit status 1; not expected)
2023-05-19 08:49:06,107 INFO spawned: ‘storagenode-updater’ with pid 3931
2023-05-19 08:49:06,107 WARN received SIGQUIT indicating exit request
2023-05-19 08:49:06,111 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2023-05-19T08:49:06.147Z INFO Configuration loaded {“Process”: “storagenode-updater”, “Location”: “/app/config/config.yaml”}
2023-05-19T08:49:06.147Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “contact.external-address”}
2023-05-19T08:49:06.147Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “operator.email”}
2023-05-19T08:49:06.147Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “operator.wallet”}
2023-05-19T08:49:06.147Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “storage.allocated-disk-space”}
2023-05-19T08:49:06.147Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “storage.allocated-bandwidth”}
2023-05-19T08:49:06.147Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “server.address”}
2023-05-19T08:49:06.147Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “server.private-address”}
2023-05-19T08:49:06.147Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “operator.wallet-features”}
2023-05-19T08:49:06.147Z INFO Invalid configuration file value for key {“Process”: “storagenode-updater”, “Key”: “log.encoding”}
2023-05-19T08:49:06.159Z INFO Anonymized tracing enabled {“Process”: “storagenode-updater”}
2023-05-19T08:49:06.160Z INFO Running on version {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”, “Version”: “v1.78.3”}
2023-05-19T08:49:06.161Z INFO Downloading versions. {“Process”: “storagenode-updater”, “Server Address”: “https://version.storj.io”}
2023-05-19 08:49:07,163 INFO success: storagenode entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-05-19 08:49:09,166 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2023-05-19T08:49:11.616Z INFO Current binary version {“Process”: “storagenode-updater”, “Service”: “storagenode”, “Version”: “v1.78.3”}
2023-05-19T08:49:11.616Z INFO Version is up to date {“Process”: “storagenode-updater”, “Service”: “storagenode”}
2023-05-19T08:49:11.621Z INFO Current binary version {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”, “Version”: “v1.78.3”}
2023-05-19T08:49:11.621Z INFO Version is up to date {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”}
2023-05-19 08:49:12,623 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2023-05-19 08:49:15,627 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2023-05-19 08:49:16,628 WARN killing ‘storagenode-updater’ (3931) with SIGKILL
2023-05-19 08:49:16,632 INFO stopped: storagenode-updater (terminated by SIGKILL)
2023-05-19 08:49:16,670 INFO stopped: storagenode (exit status 1)
2023-05-19 08:49:16,672 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)

Storagenode logs around the update

2023-05-19T08:49:02.221Z INFO piecestore upload started {“Process”: “storagenode”, “Piece ID”: “X5GHOH7UTCDX2VH26BCLUFICBF2I6G2TXKFRTO6QYA52JFVHKO3Q”, “Satellite ID”: “12EayRS
2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “Available Space”: 10185355616256, “Remote Address”: “10.10.1.1:55512”}
2023-05-19T08:49:02.637Z INFO piecestore download started {“Process”: “storagenode”, “Piece ID”: “RHNI5JTXBHVXGHBKW5KXGZXFAWUJJKJPYXNNRCCTGVUPYZYQFMAA”, “Satellite ID”:
“12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 0, “Size”: 1024, “Remote Address”: “10.10.1.1:41690”}
2023-05-19T08:49:02.778Z INFO piecestore downloaded {“Process”: “storagenode”, “Piece ID”: “RHNI5JTXBHVXGHBKW5KXGZXFAWUJJKJPYXNNRCCTGVUPYZYQFMAA”, “Satellite ID”: “12EayRS
2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 0, “Size”: 1024, “Remote Address”: “10.10.1.1:41690”}
2023-05-19T08:49:02.854Z INFO piecedeleter delete piece sent to trash {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piec
e ID”: “3ZKLE4DPNE4RJFOUI4AWB3XB2MV2VS7KTAQ4ELRQAQ33L4ODVFCA”}
2023-05-19T08:49:03.783Z INFO piecestore download started {“Process”: “storagenode”, “Piece ID”: “KJI4LZV3YZHA6SSQIBMTULYPKYIQSBZRJNMTSR2SRIS5J3FUHWVQ”, “Satellite ID”:
“12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 0, “Size”: 1280, “Remote Address”: “10.10.1.1:41700”}
2023-05-19T08:49:04.017Z INFO piecestore downloaded {“Process”: “storagenode”, “Piece ID”: “KJI4LZV3YZHA6SSQIBMTULYPKYIQSBZRJNMTSR2SRIS5J3FUHWVQ”, “Satellite ID”: “12EayRS
2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 0, “Size”: 1280, “Remote Address”: “10.10.1.1:41700”}
2023-05-19T08:49:04.682Z INFO piecestore download started {“Process”: “storagenode”, “Piece ID”: “QUKL6UC5ZWCQPSZY7VTT5KGBXOO7JP2LUK762CVXZ5XXDMXPP3AQ”, “Satellite ID”:
“12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo”, “Action”: “GET_AUDIT”, “Offset”: 11776, “Size”: 256, “Remote Address”: “10.10.1.1:41708”}
2023-05-19T08:49:04.786Z INFO piecestore download started {“Process”: “storagenode”, “Piece ID”: “K6UKYGIAXB645CELRAIPHQCNY2YFLZUMBBBXCIPE2QNBHM36BTLQ”, “Satellite ID”:
“12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 724224, “Size”: 2560, “Remote Address”: “10.10.1.1:41710”}
2023-05-19T08:49:04.813Z INFO piecestore upload started {“Process”: “storagenode”, “Piece ID”: “M33ENCPTZZGR662GCU5QLGMKYXV2PF723ZYNV64AV54J5BYXGAQQ”, “Satellite ID”: “12EayRS
2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “Available Space”: 10185355079680, “Remote Address”: “10.10.1.1:34762”}
2023-05-19T08:49:04.875Z INFO piecestore downloaded {“Process”: “storagenode”, “Piece ID”: “QUKL6UC5ZWCQPSZY7VTT5KGBXOO7JP2LUK762CVXZ5XXDMXPP3AQ”, “Satellite ID”: “12tRQrM
TWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo”, “Action”: “GET_AUDIT”, “Offset”: 11776, “Size”: 256, “Remote Address”: “10.10.1.1:41708”}
2023-05-19T08:49:04.997Z INFO piecestore downloaded {“Process”: “storagenode”, “Piece ID”: “K6UKYGIAXB645CELRAIPHQCNY2YFLZUMBBBXCIPE2QNBHM36BTLQ”, “Satellite ID”: “12EayRS
2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 724224, “Size”: 2560, “Remote Address”: “10.10.1.1:41710”}
2023-05-19T08:49:05.271Z INFO Got a signal from the OS: “interrupt” {“Process”: “storagenode”}
2023-05-19T08:49:05.273Z INFO piecestore upload canceled {“Process”: “storagenode”, “Piece ID”: “45M5WOS53UJCNM27JUB52OE4BLOIFMDBKCRTZNBNQRKUZS5UQ4DA”, “Satellite ID”: “12EayRS
2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “Size”: 131072, “Remote Address”: “10.10.1.1:54228”}
2023-05-19T08:49:05.273Z INFO piecestore upload canceled {“Process”: “storagenode”, “Piece ID”: “X5GHOH7UTCDX2VH26BCLUFICBF2I6G2TXKFRTO6QYA52JFVHKO3Q”, “Satellite ID”: “12EayRS
2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “Size”: 458752, “Remote Address”: “10.10.1.1:55512”}
2023-05-19T08:49:05.274Z INFO piecestore upload canceled {“Process”: “storagenode”, “Piece ID”: “M33ENCPTZZGR662GCU5QLGMKYXV2PF723ZYNV64AV54J5BYXGAQQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “Size”: 65536, “Remote Address”: “10.10.1.1:34762”}
2023-05-19T08:49:06.184Z INFO Configuration loaded {“Process”: “storagenode”, “Location”: “/app/config/config.yaml”}
2023-05-19T08:49:06.184Z INFO Anonymized tracing enabled {“Process”: “storagenode”}
2023-05-19T08:49:06.186Z INFO Operator email {“Process”: “storagenode”, “Address”: “”}
2023-05-19T08:49:06.186Z INFO Operator wallet {“Process”: “storagenode”, “Address”: “”}
2023-05-19T08:49:06.212Z INFO server existing kernel support for server-side tcp fast open detected {“Process”: “storagenode”}
2023-05-19T08:49:11.786Z INFO Telemetry enabled {“Process”: “storagenode”, “instance ID”: “1CpLiua4sFawBgsGzgDSTKFVsAvsHT76eiNYR66u2VA9nZBcc9”}
2023-05-19T08:49:11.786Z INFO Event collection enabled {“Process”: “storagenode”, “instance ID”: “1CpLiua4sFawBgsGzgDSTKFVsAvsHT76eiNYR66u2VA9nZBcc9”}
2023-05-19T08:49:11.889Z INFO db.migration Database Version {“Process”: “storagenode”, “version”: 54}
2023-05-19T08:49:16.632Z INFO Got a signal from the OS: “terminated” {“Process”: “storagenode”}
2023-05-19T08:49:16.633Z WARN trust Failed to fetch URLs from source; used cache {“Process”: “storagenode”, “source”: “https://www.storj.io/dcs-satellites”, “error”: “HTTP source: Get "https://www.storj.io/dcs-satellites\”: context canceled", “errorVerbose”: “HTTP source: Get "https://www.storj.io/dcs-satellites\”: context canceled\n\tstorj.io/storj/storagenode/trust.(*HTTPSource).FetchEntries:68\n\tstorj.io/storj/storagenode/trust.(*List).fetchEntries:90\n\tstorj.io/storj/storagenode/trust.(*List).FetchURLs:49\n\tstorj.io/storj/storagenode/trust.(*Pool).fetchURLs:251\n\tstorj.io/storj/storagenode/trust.(*Pool).Refresh:188\n\tstorj.io/storj/storagenode.(*Peer).Run:907\n\tmain.cmdRun:120\n\tmain.newRunCmd.func1:32\n\tstorj.io/private/process.cleanup.func1.4:399\n\tstorj.io/private/process.cleanup.func1:417\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomOptions:113\n\tmain.main:29\n\truntime.main:250"}
2023-05-19T08:49:16.634Z INFO preflight:localtime start checking local system clock with trusted satellites’ system clock. {“Process”: “storagenode”}
2023-05-19T08:49:16.634Z ERROR preflight:localtime unable to get satellite system time {“Process”: “storagenode”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “error”: “rpc: tcp connector failed: rpc: dial tcp: lookup europe-north-1.tardigrade.io: operation was canceled”, “errorVerbose”: “rpc: tcp connector failed: rpc: dial tcp: lookup europe-north-1.tardigrade.io: operation was canceled\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190”}
2023-05-19T08:49:16.635Z ERROR preflight:localtime unable to get satellite system time {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “error”: “rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled”, “errorVerbose”: “rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190”}
2023-05-19T08:49:16.635Z ERROR preflight:localtime unable to get satellite system time {“Process”: “storagenode”, “Satellite ID”: “12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo”, “error”: “rpc: tcp connector failed: rpc: dial tcp: lookup us2.storj.io: operation was canceled”, “errorVerbose”: “rpc: tcp connector failed: rpc: dial tcp: lookup us2.storj.io: operation was canceled\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190”}
2023-05-19T08:49:16.635Z ERROR preflight:localtime unable to get satellite system time {“Process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “error”: “rpc: tcp connector failed: rpc: dial tcp: lookup ap1.storj.io: operation was canceled”, “errorVerbose”: “rpc: tcp connector failed: rpc: dial tcp: lookup ap1.storj.io: operation was canceled\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190”}
2023-05-19T08:49:16.635Z ERROR preflight:localtime unable to get satellite system time {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “error”: “rpc: tcp connector failed: rpc: dial tcp: lookup eu1.storj.io: operation was canceled”, “errorVerbose”: “rpc: tcp connector failed: rpc: dial tcp: lookup eu1.storj.io: operation was canceled\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190”}
2023-05-19T08:49:16.635Z ERROR preflight:localtime unable to get satellite system time {“Process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “error”: “rpc: tcp connector failed: rpc: dial tcp: lookup saltlake.tardigrade.io: operation was canceled”, “errorVerbose”: “rpc: tcp connector failed: rpc: dial tcp: lookup saltlake.tardigrade.io: operation was canceled\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190”}
2023-05-19T08:49:16.635Z ERROR Failed preflight check. {“Process”: “storagenode”, “error”: “system clock is out of sync: system clock is out of sync with all trusted satellites”, “errorVerbose”: “system clock is out of sync: system clock is out of sync with all trusted satellites\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check:96\n\tstorj.io/storj/storagenode.(*Peer).Run:911\n\tmain.cmdRun:120\n\tmain.newRunCmd.func1:32\n\tstorj.io/private/process.cleanup.func1.4:399\n\tstorj.io/private/process.cleanup.func1:417\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomOptions:113\n\tmain.main:29\n\truntime.main:250”}
2023-05-19T08:49:16.666Z FATAL Unrecoverable error {“Process”: “storagenode”, “error”: “system clock is out of sync: system clock is out of sync with all trusted satellites”, “errorVerbose”: “system clock is out of sync: system clock is out of sync with all trusted satellites\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check:96\n\tstorj.io/storj/storagenode.(*Peer).Run:911\n\tmain.cmdRun:120\n\tmain.newRunCmd.func1:32\n\tstorj.io/private/process.cleanup.func1.4:399\n\tstorj.io/private/process.cleanup.func1:417\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomOptions:113\n\tmain.main:29\n\truntime.main:250”}

Oh that docker, if You can, save Your self with Windows GUI version, i love it.
Almost problem-less. Had to say that.

Have you tried this ? http://www.timesynctool.com/

Yet another reason to use Linux with ntp. But even Windows has its own ntp service: Configuring Systems for High Accuracy | Microsoft Learn

As I wrote, “out of sync” looks like a consequence of killed init process of storagenode. Ntp is up and running, after manual start of the container, the node is happy with date and time.
Please don’t recommend me Windows :joy:

1 Like

I need to note, the restart policy of container is set to “unless-stopped”, which is also confirmed by “docker inspect” command. But storagenode process was not restarted when stopped, maybe because storagenode-updater have been up, so container stayed up?

I will run “docker inspect” command after that happens again, but I don’t expect different result.

Thx @BrightSilence for the idea.

Looks like some race condition.
It should

  1. download the storagenode binary and restart it,
  2. check for a new versions for storagenode and storagenode-updater,
  3. download a new version of storagenode and restart it
  4. download a new version of storagenode-updater and restart it.

but somewhere the updater did not wait for the finish of the update process and restarted storagenode and itself in a wrong time, so no one service was remained.
The weird thing that why docker did not restart a stopped container?

Yeah, the container was stopped when I started to look for the problem. Is it common, that two archives are downloaded at the same time and relevant processes are restarted - storagenode_linux_amd64.1183188037.zip and storagenode-updater_linux_amd64.4176807791.zip?
Edit: Sorry, the answer is allready in your post @Alexey

Do you have any idea, why updater can stop with error? I tried to look on the source on github, but I am not Go person :slightly_smiling_face:

2023-05-19 08:49:06,105 INFO exited: storagenode-updater (exit status 1; not expected)

I have checked free space on /tmp, few GB there, it is not a ram disk.

yes, my docker node updated without any issues. I have another one, which is still on the old version. Will check it too.

nope. Perhaps it was corrupted or improperly unpacked.

docker uses an own namespace for temp files. But I suppose they should be on /tmp anyway.

The best guess what I have now - the updater managed to get asynchronouse update somehow.

So it happened again, different node, looks like similar situation. It is driving me nuts, manual interception is needed and it is making my online score not 100% :slight_smile:

exited: storagenode-updater (exit status 1; not expected)

@Alexey do you think I should open an issue report on github?

I forgot to “docker inspect” stopped container.

docker log

2023-05-30T12:23:59.159Z INFO Current binary version {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”, “Version”: “v1.78.3”}

2023-05-30T12:23:59.159Z INFO New version is being rolled out but hasn’t made it to this node yet {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”}

2023-05-30T12:38:53.725Z INFO Downloading versions. {“Process”: “storagenode-updater”, “Server Address”: “https://version.storj.io”}

2023-05-30T12:38:59.158Z INFO Current binary version {“Process”: “storagenode-updater”, “Service”: “storagenode”, “Version”: “v1.78.3”}

2023-05-30T12:38:59.158Z INFO Download started. {“Process”: “storagenode-updater”, “From”: “https://github.com/storj/storj/releases/download/v1.79.4/storagenode_linux_amd64.zip”, “To”: “/tmp/storagenode_linux_amd64.3880153684.zip”}

2023-05-30T12:39:10.244Z INFO Download finished. {“Process”: “storagenode-updater”, “From”: “https://github.com/storj/storj/releases/download/v1.79.4/storagenode_linux_amd64.zip”, “To”: “/tmp/storagenode_linux_amd64.3880153684.zip”}

2023-05-30T12:39:10.271Z INFO Restarting service. {“Process”: “storagenode-updater”, “Service”: “storagenode”}

2023-05-30T12:39:10.523Z INFO Service restarted successfully. {“Process”: “storagenode-updater”, “Service”: “storagenode”}

2023-05-30T12:39:10.529Z INFO Current binary version {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”, “Version”: “v1.78.3”}

2023-05-30T12:39:10.530Z INFO Download started. {“Process”: “storagenode-updater”, “From”: “https://github.com/storj/storj/releases/download/v1.79.4/storagenode-updater_linux_amd64.zip”, “To”: “/tmp/storagenode-updater_linux_amd64.2856344600.zip”}

2023-05-30T12:39:10.966Z INFO Download finished. {“Process”: “storagenode-updater”, “From”: “https://github.com/storj/storj/releases/download/v1.79.4/storagenode-updater_linux_amd64.zip”, “To”: “/tmp/storagenode-updater_linux_amd64.2856344600.zip”}

2023-05-30T12:39:10.978Z INFO Restarting service. {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”}

2023-05-30 12:39:10,985 INFO exited: storagenode-updater (exit status 1; not expected)

2023-05-30 12:39:11,324 INFO spawned: ‘storagenode-updater’ with pid 14031

2023-05-30 12:39:11,325 INFO exited: storagenode (exit status 0; expected)

2023-05-30T12:39:11.345Z INFO Configuration loaded {“Process”: “storagenode-updater”, “Location”: “/app/config/config.yaml”}

2023-05-30T12:39:11.345Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “healthcheck.details”}

2023-05-30T12:39:11.345Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “server.private-address”}

2023-05-30T12:39:11.345Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “server.address”}

2023-05-30T12:39:11.345Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “healthcheck.enabled”}

2023-05-30T12:39:11.345Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “operator.email”}

2023-05-30T12:39:11.345Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “operator.wallet”}

2023-05-30T12:39:11.345Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “contact.external-address”}

2023-05-30T12:39:11.345Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “storage.allocated-disk-space”}

2023-05-30T12:39:11.345Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “storage.allocated-bandwidth”}

2023-05-30T12:39:11.345Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “operator.wallet-features”}

2023-05-30T12:39:11.345Z INFO Invalid configuration file key {“Process”: “storagenode-updater”, “Key”: “console.address”}

2023-05-30T12:39:11.345Z INFO Invalid configuration file value for key {“Process”: “storagenode-updater”, “Key”: “log.caller”}

2023-05-30T12:39:11.345Z INFO Invalid configuration file value for key {“Process”: “storagenode-updater”, “Key”: “log.level”}

2023-05-30T12:39:11.345Z INFO Invalid configuration file value for key {“Process”: “storagenode-updater”, “Key”: “log.output”}

2023-05-30T12:39:11.345Z INFO Invalid configuration file value for key {“Process”: “storagenode-updater”, “Key”: “log.stack”}

2023-05-30 12:39:11,347 INFO spawned: ‘storagenode’ with pid 14037

2023-05-30 12:39:11,347 WARN received SIGQUIT indicating exit request

2023-05-30 12:39:11,350 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die

2023-05-30 12:39:11,354 INFO stopped: storagenode-updater (terminated by SIGTERM)

2023-05-30 12:39:11,355 INFO stopped: storagenode (terminated by SIGTERM)

2023-05-30 12:39:11,357 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)

node log

2023-05-30T12:39:11.180Z ERROR collector unable to update piece info {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “UL3JU5D6LVSZBQXYOFZCYAW6I5TO4DNRBONQ2WBNTXANT34H3QIQ”, “error”: “pieceexpirationdb: context canceled”, “errorVerbose”: “pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).DeleteFailed:99\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteFailed:567\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:109\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

2023-05-30T12:39:11.180Z ERROR collector unable to delete piece {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “UL3JU5D6LVSZBQXYOFZCYAW6I5TO4DNRBONQ2WBNTXANT34H3QIQ”, “error”: “pieces error: context canceled; v0pieceinfodb: context canceled”, “errorVerbose”: “pieces error: context canceled; v0pieceinfodb: context canceled\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteExpired:356\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:335\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

2023-05-30T12:39:11.182Z ERROR collector unable to update piece info {“Process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Piece ID”: “HSSDPSBSI2UQBQ3QMMZZAC6NQKLRVT4XJUOOXU2MAQYZJPR5DQJA”, “error”: “pieceexpirationdb: context canceled”, “errorVerbose”: “pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).DeleteFailed:99\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteFailed:567\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:109\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

2023-05-30T12:39:11.182Z ERROR collector unable to delete piece {“Process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Piece ID”: “HSSDPSBSI2UQBQ3QMMZZAC6NQKLRVT4XJUOOXU2MAQYZJPR5DQJA”, “error”: “pieces error: context canceled; v0pieceinfodb: context canceled”, “errorVerbose”: “pieces error: context canceled; v0pieceinfodb: context canceled\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteExpired:356\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:335\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

2023-05-30T12:39:11.223Z ERROR collector unable to update piece info {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Piece ID”: “W6M2WVBWCU6ZAFIEEXUT6HXR3D4AT3JIZHCCNEEDCEP73VBLYB4Q”, “error”: “pieceexpirationdb: context canceled”, “errorVerbose”: “pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).DeleteFailed:99\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteFailed:567\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:109\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

2023-05-30T12:39:11.223Z ERROR collector unable to delete piece {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Piece ID”: “W6M2WVBWCU6ZAFIEEXUT6HXR3D4AT3JIZHCCNEEDCEP73VBLYB4Q”, “error”: “pieces error: context canceled; v0pieceinfodb: context canceled”, “errorVerbose”: “pieces error: context canceled; v0pieceinfodb: context canceled\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteExpired:356\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:335\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

2023-05-30T12:39:11.244Z ERROR collector unable to update piece info {“Process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Piece ID”: “WLNTC5PPL2IBCGP6ZUXBQWVSFTF26MZTERZ4SLCKU6SX556HTQ4A”, “error”: “pieceexpirationdb: context canceled”, “errorVerbose”: “pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).DeleteFailed:99\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteFailed:567\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:109\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

2023-05-30T12:39:11.244Z ERROR collector unable to delete piece {“Process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Piece ID”: “WLNTC5PPL2IBCGP6ZUXBQWVSFTF26MZTERZ4SLCKU6SX556HTQ4A”, “error”: “pieces error: context canceled; v0pieceinfodb: context canceled”, “errorVerbose”: “pieces error: context canceled; v0pieceinfodb: context canceled\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteExpired:356\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:335\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

2023-05-30T12:39:11.244Z INFO collector collect {“Process”: “storagenode”, “count”: 3}

2023-05-30T12:39:11.244Z ERROR collector error during collecting pieces: {“Process”: “storagenode”, “error”: “pieceexpirationdb: context canceled”, “errorVerbose”: “pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).GetExpired:39\n\tstorj.io/storj/storagenode/pieces.(*Store).GetExpired:541\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:88\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

I have had such an issue in the past too.
I don’t have a log, so I can’t say if I had exactly the same message.
But it happened more than a couple of times and it is risky because updates are frequent.
Every time after it has happened I had to simply start the node manually and there was never an issue starting it. It just started and ran without problems.
So yes, maybe there is an issue with the updater that makes the restart after update fail some times.

Yes, I think so, the issue on GitHub will allow to collect needed information and could be taken into work.

I do experience the same time to time, but I could not find why it’s happening or why docker containers do not restart when that happens, despite being configuredto restart “unless-stopped”. Actually they’re even configured to “always-restart” now, but it does not help.

I understand that this won’t help you directly. Though, I wanted to say that for the three years that I am operating a node, I’ve never had this kind of problem. Likely different setup, hardware, software, etc. it’s just never happened to me that an upgrade was broken… except for one time where it was clear from the logs what happened. For me the Storj-provided software was very reliable.

I hope you will figure it out!

1 Like

yeah, thank you for the follow up. These problems with time hazard (which is the thing, I believe) can be triggered by very unrelated parameters and looks like I have a “correct environment” for the problem now :slight_smile:

I have opened the issue on GitHub, hopefully under the correct project.

1 Like

aaand i have the same problem suddenly,
i “reproduced it” on WindowsGUI
its because i changed location and time zone of the node.
Got new location (which is +1 hour different to previous.)
new internet, and new ip here now, and i can not return to old one.
Trying to run node in new location,
Im pretty sure all the files are okay,
(it just got cut out form the internet during STORJ running, and can’t be back on old connection)
is theres some way to flush catche on sattelites? im desperatly trying to run it, to stop loosing Online time, but its erroring me in the log for “system clock is out of sync with all trusted sattelites”,
i mannualy update to some 1.80.something ver but stilll…
im scanning disk for errors now… C got none, now Storj disk…
WindowsGui

if the only error is “clock out of sync”, then you may sync them (disable sync with internet, apply, enable sync with internet, apply) or even run a nettime (https://www.timesynctool.com/).
But if you have some other errors before that, than the root cause likely not related to system clocks.

7 posts were split to a new topic: Tcp connector failed: rpc: dial tcp 34.159.134.91:7777: connectex: No connection could be made because the target machine actively refused it

Just for the reference on OP, since I did not get any reply on bug report, I had to add very nasty command to my crontab. It will start all stopped containers with name matching reqex. My storagenode containers are namen sn%d. Hope it will be diagnosed sometime. It allready started storagenode container twice.

docker start $(docker ps --format “{{.Names}}” --filter “status=exited” --filter “name=^sn\d+$”) > /dev/null

Did you try to use --restart always policy in your docker run command?