[Tech Preview] Linux Storage Node & Updater

@littleskunk I can confirm, service storagenode is not started after update:

Here is the updater log, as you can see updater is restarted storagenode service. Also, we have some new “Invalid configuration options”

Here is storagenode log:

So, here is no restart, it is stop, please make sure that the updater is restating storage node instead of stopping.

1 Like

I looked into storj/restart_linux.go at 89e682b4d73dc2b1c4623e174095d3d441ceb1b6 · storj/storj · GitHub
And see only “stop” on restart function:

ADD:

Based on the latest restart issue, I would like to propose changes for services:

Now we have:

[Unit]
Description  = Storage Node service
After        = syslog.target network.target

[Service]
Type         = simple
User         = storj-storagenode
Group        = storj-storagenode
ExecStart    = /opt/storagenode/bin/storagenode run --config-dir "/etc/storagenode/config"
Restart      = on-failure
NotifyAccess = main

[Install]
Alias        = storagenode
WantedBy     = multi-user.target

After change:

[Unit]
Description  = Storage Node service
After        = syslog.target network.target

[Service]
Type         = simple
User         = storj-storagenode
Group        = storj-storagenode
ExecStart    = /opt/storagenode/bin/storagenode run --config-dir "/etc/storagenode/config"

# Give a reasonable amount of time for the server to start up/shut down
TimeoutSec   = 300
RestartSec   = 30
#Restart      = on-failure
Restart      = always
NotifyAccess = main

[Install]
Alias        = storagenode
WantedBy     = multi-user.target

Description for new parameters is here

TimeoutSec   = 300
RestartSec   = 30
Restart      = always

The same parameters can be applied for storagenode-updater service.

4 Likes

when updating to a new version 1.21.2, the nodes did not start again. I’ll try to make changes to the service settings.

1 Like

@littleskunk I pay your attention again, the storage node updater is not starting storagenode service after an update.

Here is an example:


As you can see, the service is starting after 30sec. because I applied a workaround for service

The root cause is simple: updater service is killing himself, end exited with exit code 1 (failure), after exit service is restarting updater service because service is failed (exit code 1).
Storage node service is stopping by updater service with exit code 0 (success), after exit service do nothing (stay stopped), because we have Restart = on-failure.

Solution: please add start storage node service to restart function on updater service.

Could you please confirm this issue?

2 Likes

Thank you very much for the heads-up. I’ll be sure to keep an eye out when my storage nodes update.

1 Like

I can confirm this issue, i had it today, which caused a ~8 Hour Downtime on one of my nodes.

Service started downloading, seems to have stoppend the main node-service, then failed, got restarted by systemd, but the main node-service was not, because the updater-service failed.
No Idea what caused it though.

I changed unitfiles of both services to restart = always as a workaround.

1 Like

I can confirm that one of my nodes failed to restart after updating to 1.21.2

1 Like

Same here, yes. It failed to start after the upgrade.

1 Like

Thank you for reporting such issue. We are going to update the documentation to instruct users to set their configuration to Restart=always.

5 Likes

what’s the link to the latest documentation please?

1 Like

I would pay more attention to a potentially critical bug, abnormal memory consumption of “systemd-journal”:
Yesterday:
image
Today:
image

I still working on determining the root cause and preventing it, will post a solution soon

PS. you can check memory consumption on your side with:
ps -A --sort -rss -o comm,pmem,rss | head -n 20

2 Likes

PSA: the v1.22.2 ARM binary zip file appears to have two files in it, causing the following error during the update.

2021-02-17T14:25:53.336-0700        ERROR        Error updating service.        {"Service": "storagenode", "error": "archive should contain only one file", "errorVerbose": "archive should contain only one file\n\tmain.unpackBinary:94\n\tmain.downloadBinary:61\n\tmain.update:44\n\tmain.loopFunc:
26\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tmain.cmdRun:126\n\tstorj.io/private/process.cleanup.func1.4:363\n\tstorj.io/private/process.cleanup.func1:381\n\tgithub.com/spf13/cobra.(*Command).execute:842\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:950\n\tgithub.com/spf13/cobra.(*Command).Execute:887\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/
private/process.Exec:65\n\tmain.main:11\n\truntime.main:204"}

The zip file for the AMD64 binary does not have this issue:

The __MACOSX folder is likely tripping up the updater service.

I’ve manually updated my ARM node for this version.

3 Likes

@littleskunk anyone know if there’s an installer yet? Sorry to sound like a broken record but ive finally got my internet connection at the new house and I’m ready to setup all my nodes again after running GE last year.

Also, if there’s no installer, should I setup my new production nodes using this? Is this production ready?

Any issues playing it safe and setting my new nodes up with docker? Will the new Multi-node Dashboard work with docker setup?

Cheers

@will.topping, for what it’s worth, I’ve been using this since v1.15.3 and it’s been equally stable as the Docker nodes. I know a .deb or something is preferable, but the installation isn’t too onerous. As linked by @ACarneiro last month, I’ll pat myself on the back for the following instructions. A couple binary downloads, creating a dedicated user, and a couple systemd service files and it’ll be up and running. Perhaps something to hold you over until an installer is released.

2 Likes

Thanks @fmoledina I just spotted your post in the other thread and think I’m gonna give it a go and hope for the best! I may be back with questions :joy:

So my Linux node hasn’t updated to the latest version.
Having had a look at my logs, I get the following error:

Feb 19 18:11:27 Juno storagenode-updater[31170]: 2021-02-19T18:11:27.258Z ERROR Error updating service. {“Service”: “storagenode”, “error”: “archive should contain only one file”, “er
rorVerbose”: “archive should contain only one file\n\tmain.unpackBinary:94\n\tmain.downloadBinary:61\n\tmain.update:44\n\tmain.loopFunc:26\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tmain.cmdRun:126\n\ts
torj.io/private/process.cleanup.func1.4:363\n\tstorj.io/private/process.cleanup.func1:381\n\tgithub.com/spf13/cobra.(*Command).execute:842\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:950\n\tgithub.com/s
pf13/cobra.(*Command).Execute:887\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.main:11\n\truntime.main:204”}

Any thoughts on what is wrong and what I can do to fix it?

I really need to start opening my eyes… :roll_eyes:

So presumably this is not something I can fix, I just have to wait until it’s fixed server side?

You need to download the archive manually from the releases page on Github and install it just like you would have done the very first time. It won’t update automatically as it won’t be able to get past the multiple files error shown.