Another 21h of downtime I would've loved to prevent (rant)

My Windows node have been auto-updated for 6 months or so now, not a single issue. now suddenly it broke and this, based on the comments here must have been quite usual. Why coun’t storj just send a email to the network telling everyone “hey you might have to restart your node manually”

looks like i have to be monitoring the service. It was just luck i discovered it today, the node went down yesterday.

I’d had this very same issue with like every other update or so

Also I stopped manually restarting the node service since that did NEVER force the update, although it was said to behave like that.

The update service has worked fine since I made it auto-start, running as system. This has worked for at least 3 months, every single update.
Now it failed, i just restarted the update service, and started the main service and the node was updated.

This time there was a DB update, perhaps that once failed and the update service is just stupid and don’t retry, hard to tell as there are no logs for it.

Like me, I had never had a problem with updates until 2020-09-26 01:13:21 the downtime lasted 42 hours, 18 minutes. I hope we do not have problems due to disqualifications, since the problem is not due to the node, since the problem was the update

36 hours of dt might give you a suspension, not sure if it was actually active or one just got the warning… been added in v1.12.3 i think it was

Just to add the relevant log from my Windows node with the same issue

2020-09-25T21:31:31.607+0100	INFO	Downloading versions.	{"Server Address": "https://version.storj.io"}
2020-09-25T21:31:32.104+0100	INFO	New version is being rolled out but hasn't made it to this node yet	{"Service": "storagenode"}
2020-09-25T21:31:32.104+0100	INFO	New version is being rolled out but hasn't made it to this node yet	{"Service": "storagenode-updater"}
2020-09-25T21:46:31.602+0100	INFO	Downloading versions.	{"Server Address": "https://version.storj.io"}
2020-09-25T21:46:32.097+0100	INFO	Download started.	{"From": "https://github.com/storj/storj/releases/download/v1.13.1/storagenode_windows_amd64.exe.zip", "To": "C:\\WINDOWS\\TEMP\\storagenode_windows_amd64.exe.103577742.zip"}
2020-09-25T21:46:34.718+0100	INFO	Download finished.	{"From": "https://github.com/storj/storj/releases/download/v1.13.1/storagenode_windows_amd64.exe.zip", "To": "C:\\WINDOWS\\TEMP\\storagenode_windows_amd64.exe.103577742.zip"}
2020-09-25T21:46:34.877+0100	INFO	Restarting service.	{"Service": "storagenode"}
2020-09-25T21:46:35.208+0100	INFO	Service restarted successfully.	{"Service": "storagenode"}
2020-09-25T21:46:35.210+0100	INFO	Download started.	{"From": "https://github.com/storj/storj/releases/download/v1.13.1/storagenode-updater_windows_amd64.exe.zip", "To": "C:\\WINDOWS\\TEMP\\storagenode-updater_windows_amd64.exe.177710741.zip"}
2020-09-25T21:46:36.525+0100	INFO	Download finished.	{"From": "https://github.com/storj/storj/releases/download/v1.13.1/storagenode-updater_windows_amd64.exe.zip", "To": "C:\\WINDOWS\\TEMP\\storagenode-updater_windows_amd64.exe.177710741.zip"}
2020-09-25T21:46:36.676+0100	INFO	Restarting service.	{"Service": "storagenode-updater"}
2020-09-25T22:01:31.600+0100	INFO	Downloading versions.	{"Server Address": "https://version.storj.io"}
2020-09-25T22:01:32.110+0100	INFO	Version is up to date	{"Service": "storagenode"}
2020-09-25T22:01:32.110+0100	INFO	Download started.	{"From": "https://github.com/storj/storj/releases/download/v1.13.1/storagenode-updater_windows_amd64.exe.zip", "To": "C:\\WINDOWS\\TEMP\\storagenode-updater_windows_amd64.exe.997457136.zip"}
2020-09-25T22:01:33.998+0100	ERROR	Error updating service.	{"Service": "storagenode-updater", "error": "open C:\\Program Files\\Storj\\Storage Node\\storagenode-updater.1.13.1.exe: The file exists.", "errorVerbose": "open C:\\Program Files\\Storj\\Storage Node\\storagenode-updater.1.13.1.exe: The file exists.\n\tmain.downloadBinary:62\n\tmain.updateSelf:64\n\tmain.loopFunc:36\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tmain.cmdRun:116\n\tstorj.io/private/process.cleanup.func1.4:353\n\tstorj.io/private/process.cleanup.func1:371\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:55\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
1 Like

Hi all,
We’re working quickly to resolve this.

Hold tight!

5 Likes

This bug is fixed in 1.13.3

3 Likes

Better would be to use a docker-compose file to describe each node. Then updates become as simple as docker-compose pull && docker-compose up -d -t 300. And if no update is available, nothing happens.

1 Like

that’s interesting, not really very familiar with the nuances of docker commands aside from basic log related and how to stop start and pull…

looks very useful, i’m going to have to look into exactly how that is done…