Another 21h of downtime I would've loved to prevent (rant)

down

All of these downtimes were caused by the Storj windows services not restarting after node update.

The only reason “storjserv3” was not affected is that i retarted that one some days ago.

i think there is a feature in windows somewhere, that you can make it auto launch if a service is detected as not running for like 5 minutes or whatever…

can’t say i’ve used it more than a few times over the last decade, but i’m pretty sure it exists or did.

I update my node manually, so that things like this do not happen. I start the update when I know I’ll be present for at least an hour after it, so that I could fix problems that can appear because of the update.

2 Likes

Same here… manual updates… tho i wouldn’t mind to utilize the watchtower if i could push a button or give it a command to update… or if it told me there was a new update and i could schedule it when i wanted.

I don’t know how to update it manually. If this is possible, this belongs in the documentation.

Same happened to me, 7 hours of downtown thanks to it while i was sleeping…

1 Like

You can do this by only starting watchtower when you need it.

1 Like

Go to Services, select Storage Node Update service (double click), tab Recovery, and select restart service on recovery options :wink:

2 Likes

As my starting post states, I’m on Windows

Thank you, did that on 1 one my nodes to try it out :slight_smile:

1 Like

I have the same problem. This 2nd update this month gave me a downtime for 10 hours, the first one I saw in the morning after 3 hours…

It is really annoying since all other service I am running run fine! It must be a STORJ problem…

1 Like

manual update in docker is pretty straight forward

docker stop storagenode

docker rm storagenode

docker pull storjlabs/storagenode:latest

docker run (you know the drill else its explained in the storagenode configuration documentation, alas a ton of parameters goes goes here)…

and then i usually do a

docker logs --tail 20 storagenode --follow

just to make sure everything seems to be in order…
my pre update check list is verifying that there isn’t a lot of grief on the forum for the first few days after an update is released.

aside from that in a few key spots where i got about working or spend long periods i got an old monitor that i let live logs run on, so that i can get a sense of if there is any hints of issues, especially if i’m troubleshooting

else i would recommend setting up some alerts of various kinds for various unwanted events so that the system can get in contact with you… either through an email, sms, ofc this is a bit of a can of worms as there are like a 1000 different variations of errors that won’t trigger such things…

i like the monitor and live log solution, not really super practical in many cases… but it sure is effective for catching some issues.

i’m sure i will eventually solve the whole downtime alert in an opensource practical way, but sadly not there yet…

You forgot to update this node:

docker pull storjlabs/storagenode:latest

please do not mislead others with incomplete information …

1 Like

oh my… ye be right…

fixed it

Please, could you show the log of storagenode-updater?

2020-09-27T08:30:17.638+0200    INFO    Downloading versions.   {"Server Address": "https://version.storj.io"}
2020-09-27T08:30:18.292+0200    INFO    Version is up to date   {"Service": "storagenode"}
2020-09-27T08:30:18.292+0200    INFO    Download started.       {"From": "https://github.com/storj/storj/releases/download/v1.13.1/storagenode-updater_windows_amd64.exe.zip", "To": "C:\\Windows\\TEMP\\storagenode-updater_windows_amd64.exe.864463516.zip"}
2020-09-27T08:30:20.447+0200    ERROR   Error updating service. {"Service": "storagenode-updater", "error": "open C:\\Program Files\\Storj\\Storage Node\\storagenode-updater.1.13.1.exe: Die Datei ist vorhanden.", "errorVerbose": "open C:\\Program Files\\Storj\\Storage Node\\storagenode-updater.1.13.1.exe: Die Datei ist vorhanden.\n\tmain.downloadBinary:62\n\tmain.updateSelf:64\n\tmain.loopFunc:36\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tmain.cmdRun:116\n\tstorj.io/private/process.cleanup.func1.4:353\n\tstorj.io/private/process.cleanup.func1:371\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:55\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-09-27T08:45:17.633+0200    INFO    Downloading versions.   {"Server Address": "https://version.storj.io"}
2020-09-27T08:45:18.274+0200    INFO    Version is up to date   {"Service": "storagenode"}
2020-09-27T08:45:18.274+0200    INFO    Download started.       {"From": "https://github.com/storj/storj/releases/download/v1.13.1/storagenode-updater_windows_amd64.exe.zip", "To": "C:\\Windows\\TEMP\\storagenode-updater_windows_amd64.exe.564032331.zip"}
2020-09-27T08:45:20.368+0200    ERROR   Error updating service. {"Service": "storagenode-updater", "error": "open C:\\Program Files\\Storj\\Storage Node\\storagenode-updater.1.13.1.exe: Die Datei ist vorhanden.", "errorVerbose": "open C:\\Program Files\\Storj\\Storage Node\\storagenode-updater.1.13.1.exe: Die Datei ist vorhanden.\n\tmain.downloadBinary:62\n\tmain.updateSelf:64\n\tmain.loopFunc:36\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tmain.cmdRun:116\n\tstorj.io/private/process.cleanup.func1.4:353\n\tstorj.io/private/process.cleanup.func1:371\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:55\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-09-27T09:00:17.630+0200    INFO    Downloading versions.   {"Server Address": "https://version.storj.io"}
2020-09-27T09:00:18.275+0200    INFO    Version is up to date   {"Service": "storagenode"}
2020-09-27T09:00:18.275+0200    INFO    Download started.       {"From": "https://github.com/storj/storj/releases/download/v1.13.1/storagenode-updater_windows_amd64.exe.zip", "To": "C:\\Windows\\TEMP\\storagenode-updater_windows_amd64.exe.812528686.zip"}
2020-09-27T09:00:20.315+0200    ERROR   Error updating service. {"Service": "storagenode-updater", "error": "open C:\\Program Files\\Storj\\Storage Node\\storagenode-updater.1.13.1.exe: Die Datei ist vorhanden.", "errorVerbose": "open C:\\Program Files\\Storj\\Storage Node\\storagenode-updater.1.13.1.exe: Die Datei ist vorhanden.\n\tmain.downloadBinary:62\n\tmain.updateSelf:64\n\tmain.loopFunc:36\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tmain.cmdRun:116\n\tstorj.io/private/process.cleanup.func1.4:353\n\tstorj.io/private/process.cleanup.func1:371\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:55\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-09-27T09:15:17.617+0200    INFO    Downloading versions.   {"Server Address": "https://version.storj.io"}
2020-09-27T09:15:18.264+0200    INFO    Version is up to date   {"Service": "storagenode"}
2020-09-27T09:15:18.264+0200    INFO    Download started.       {"From": "https://github.com/storj/storj/releases/download/v1.13.1/storagenode-updater_windows_amd64.exe.zip", "To": "C:\\Windows\\TEMP\\storagenode-updater_windows_amd64.exe.008359861.zip"}
2020-09-27T09:15:20.535+0200    ERROR   Error updating service. {"Service": "storagenode-updater", "error": "open C:\\Program Files\\Storj\\Storage Node\\storagenode-updater.1.13.1.exe: Die Datei ist vorhanden.", "errorVerbose": "open C:\\Program Files\\Storj\\Storage Node\\storagenode-updater.1.13.1.exe: Die Datei ist vorhanden.\n\tmain.downloadBinary:62\n\tmain.updateSelf:64\n\tmain.loopFunc:36\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tmain.cmdRun:116\n\tstorj.io/private/process.cleanup.func1.4:353\n\tstorj.io/private/process.cleanup.func1:371\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:55\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-09-27T09:30:17.603+0200    INFO    Downloading versions.   {"Server Address": "https://version.storj.io"}
2020-09-27T09:30:18.253+0200    INFO    Version is up to date   {"Service": "storagenode"}
2020-09-27T09:30:18.253+0200    INFO    Download started.       {"From": "https://github.com/storj/storj/releases/download/v1.13.1/storagenode-updater_windows_amd64.exe.zip", "To": "C:\\Windows\\TEMP\\storagenode-updater_windows_amd64.exe.726030736.zip"}
2020-09-27T09:30:20.407+0200    ERROR   Error updating service. {"Service": "storagenode-updater", "error": "open C:\\Program Files\\Storj\\Storage Node\\storagenode-updater.1.13.1.exe: Die Datei ist vorhanden.", "errorVerbose": "open C:\\Program Files\\Storj\\Storage Node\\storagenode-updater.1.13.1.exe: Die Datei ist vorhanden.\n\tmain.downloadBinary:62\n\tmain.updateSelf:64\n\tmain.loopFunc:36\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tmain.cmdRun:116\n\tstorj.io/private/process.cleanup.func1.4:353\n\tstorj.io/private/process.cleanup.func1:371\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:55\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

so it basically the updater ran into a program loop and stayed there… it happens with new software, takes a lot of time to take into account all possible variations that can happen… so not really an error more of an oversight of some sort i guess…

maybe it should have sent an email or in some way try to get in contact with the SNO or phone home and report the issue, so that in case of major issues with the updater, then storj will have a database that starts to rapidly grow and thus it can serve as an early warning system to minimize damage caused by errors and be countered with different updated before it migrates to deep into the network… or the bad release could be skipped completely like in the past.

You can set Uptimerobot to do that.

But for me, it seldom helps, because the node tends to update right after I go to sleep. 7+ hours downtime for me this night.

1 Like

If you look at the screenshot I provided, you can see that this problem is the same for me :smiley:

yeah i would want to sms option to ensure good coverage, but thats 5$ and for that i could just as well hook up and old mobile phone and make that keep track of if the server is online and send me an sms…

then i could make whatever kind of setup with it that i like…

but i’m getting a vps setup so i will most likely be using that to send me alerts about storagenode status on the go or when i sleep, might as well restrict the subscriptions i need so much as i can to reduce overhead costs.

maybe simply make it so the server sends an email with a status update and if then have a script read it and check that everything is within acceptable parameters. and then if it isn’t send an sms reporting the inconsistency