Based on the latest restart issue, I would like to propose changes for services:
Now we have:
[Unit]
Description = Storage Node service
After = syslog.target network.target
[Service]
Type = simple
User = storj-storagenode
Group = storj-storagenode
ExecStart = /opt/storagenode/bin/storagenode run --config-dir "/etc/storagenode/config"
Restart = on-failure
NotifyAccess = main
[Install]
Alias = storagenode
WantedBy = multi-user.target
After change:
[Unit]
Description = Storage Node service
After = syslog.target network.target
[Service]
Type = simple
User = storj-storagenode
Group = storj-storagenode
ExecStart = /opt/storagenode/bin/storagenode run --config-dir "/etc/storagenode/config"
# Give a reasonable amount of time for the server to start up/shut down
TimeoutSec = 300
RestartSec = 30
#Restart = on-failure
Restart = always
NotifyAccess = main
[Install]
Alias = storagenode
WantedBy = multi-user.target
The root cause is simple: updater service is killing himself, end exited with exit code 1 (failure), after exit service is restarting updater service because service is failed (exit code 1).
Storage node service is stopping by updater service with exit code 0 (success), after exit service do nothing (stay stopped), because we have Restart = on-failure.
Solution: please add start storage node service to restart function on updater service.
I can confirm this issue, i had it today, which caused a ~8 Hour Downtime on one of my nodes.
Service started downloading, seems to have stoppend the main node-service, then failed, got restarted by systemd, but the main node-service was not, because the updater-service failed.
No Idea what caused it though.
PSA: the v1.22.2 ARM binary zip file appears to have two files in it, causing the following error during the update.
2021-02-17T14:25:53.336-0700 ERROR Error updating service. {"Service": "storagenode", "error": "archive should contain only one file", "errorVerbose": "archive should contain only one file\n\tmain.unpackBinary:94\n\tmain.downloadBinary:61\n\tmain.update:44\n\tmain.loopFunc:
26\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tmain.cmdRun:126\n\tstorj.io/private/process.cleanup.func1.4:363\n\tstorj.io/private/process.cleanup.func1:381\n\tgithub.com/spf13/cobra.(*Command).execute:842\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:950\n\tgithub.com/spf13/cobra.(*Command).Execute:887\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/
private/process.Exec:65\n\tmain.main:11\n\truntime.main:204"}
@littleskunk anyone know if there’s an installer yet? Sorry to sound like a broken record but ive finally got my internet connection at the new house and I’m ready to setup all my nodes again after running GE last year.
Also, if there’s no installer, should I setup my new production nodes using this? Is this production ready?
Any issues playing it safe and setting my new nodes up with docker? Will the new Multi-node Dashboard work with docker setup?
@will.topping, for what it’s worth, I’ve been using this since v1.15.3 and it’s been equally stable as the Docker nodes. I know a .deb or something is preferable, but the installation isn’t too onerous. As linked by @ACarneiro last month, I’ll pat myself on the back for the following instructions. A couple binary downloads, creating a dedicated user, and a couple systemd service files and it’ll be up and running. Perhaps something to hold you over until an installer is released.
You need to download the archive manually from the releases page on Github and install it just like you would have done the very first time. It won’t update automatically as it won’t be able to get past the multiple files error shown.