[Tech Preview] Linux Storage Node & Updater

Odmin · January 22, 2021, 9:31pm

Please see my previous post, I indicated non-critical errors with red arrows.

fmoledina · January 22, 2021, 9:48pm

Ah, I see that now, interesting. I don’t have color set in my config.yaml and I’ve set log.level: info. Upon upgrading to 1.19.6, I also saw the Invalid configuration value for key error for log.level but that has gone away with the upgrade to 1.20.2. What is log.level set to for you?

Odmin · January 23, 2021, 8:31am

I have log.level info too

madest13 · January 27, 2021, 12:41pm

translated 8 nodes to work without docker. works without problems and has been updated from version 1.16.1.

tonight when updating to version 1.21, all files were downloaded without problems, but the node did not start after that. there are no errors in the logs, just the node did not start.

Odmin · January 27, 2021, 2:12pm

Do you have any information from the updater log?

Odmin · January 29, 2021, 7:48am

@littleskunk I can confirm, service storagenode is not started after update:

Here is the updater log, as you can see updater is restarted storagenode service. Also, we have some new “Invalid configuration options”

Here is storagenode log:

So, here is no restart, it is stop, please make sure that the updater is restating storage node instead of stopping.

Odmin · January 29, 2021, 8:49am

I looked into storj/restart_linux.go at 89e682b4d73dc2b1c4623e174095d3d441ceb1b6 · storj/storj · GitHub
And see only “stop” on restart function:

ADD:

Based on the latest restart issue, I would like to propose changes for services:

Now we have:

[Unit]
Description  = Storage Node service
After        = syslog.target network.target

[Service]
Type         = simple
User         = storj-storagenode
Group        = storj-storagenode
ExecStart    = /opt/storagenode/bin/storagenode run --config-dir "/etc/storagenode/config"
Restart      = on-failure
NotifyAccess = main

[Install]
Alias        = storagenode
WantedBy     = multi-user.target

After change:

[Unit]
Description  = Storage Node service
After        = syslog.target network.target

[Service]
Type         = simple
User         = storj-storagenode
Group        = storj-storagenode
ExecStart    = /opt/storagenode/bin/storagenode run --config-dir "/etc/storagenode/config"

# Give a reasonable amount of time for the server to start up/shut down
TimeoutSec   = 300
RestartSec   = 30
#Restart      = on-failure
Restart      = always
NotifyAccess = main

[Install]
Alias        = storagenode
WantedBy     = multi-user.target

Description for new parameters is here

TimeoutSec   = 300
RestartSec   = 30
Restart      = always

The same parameters can be applied for storagenode-updater service.

madest · January 30, 2021, 1:56pm

when updating to a new version 1.21.2, the nodes did not start again. I’ll try to make changes to the service settings.

Odmin · February 4, 2021, 11:05am

@littleskunk I pay your attention again, the storage node updater is not starting storagenode service after an update.

Here is an example:

As you can see, the service is starting after 30sec. because I applied a workaround for service

The root cause is simple: updater service is killing himself, end exited with exit code 1 (failure), after exit service is restarting updater service because service is failed (exit code 1).
Storage node service is stopping by updater service with exit code 0 (success), after exit service do nothing (stay stopped), because we have Restart = on-failure.

Solution: please add start storage node service to restart function on updater service.

Could you please confirm this issue?

ACarneiro · February 4, 2021, 11:53am

Thank you very much for the heads-up. I’ll be sure to keep an eye out when my storage nodes update.

badfrog · February 5, 2021, 10:59pm

I can confirm this issue, i had it today, which caused a ~8 Hour Downtime on one of my nodes.

Service started downloading, seems to have stoppend the main node-service, then failed, got restarted by systemd, but the main node-service was not, because the updater-service failed.
No Idea what caused it though.

I changed unitfiles of both services to restart = always as a workaround.

ACarneiro · February 7, 2021, 12:10pm

I can confirm that one of my nodes failed to restart after updating to 1.21.2

fmoledina · February 7, 2021, 7:29pm

Same here, yes. It failed to start after the upgrade.

Yingrong · February 8, 2021, 10:26pm

Thank you for reporting such issue. We are going to update the documentation to instruct users to set their configuration to Restart=always.

will.topping · February 8, 2021, 10:51pm

what’s the link to the latest documentation please?

Alexey · February 9, 2021, 2:52am

github.com

storj/storagenode-deb/blob/1857f4e9afde4100a5450c8608a05019a921f58f/docs/contributor-guide.md

# Storage Node Debian Package Contributor Guide

## Dependencies
To build the package, you will need the debian packaging tools: build-essential, devscripts and debhelper.

## Build the package
Once they are installed, go to the 'packaging' directory and run the following command:
`
dpkg-buildpackage -us -uc -b
`
If successful, it should create a storagenode-version.deb file. You can install it on a debian system using the following command:
`
dpkg -i storagenode-version.deb
`

## Structure

### `control`
This file contains the information about the package: its name, its description, its dependencies (here debhelper >= 10 is necessary for installing the systemd services easily),
the architecture it is made for and infos about the maintainer.

This file has been truncated. show original

Odmin · February 14, 2021, 11:04am

I would pay more attention to a potentially critical bug, abnormal memory consumption of “systemd-journal”:
Yesterday:

Today:

I still working on determining the root cause and preventing it, will post a solution soon ™

PS. you can check memory consumption on your side with:
ps -A --sort -rss -o comm,pmem,rss | head -n 20

fmoledina · February 18, 2021, 4:11pm

PSA: the v1.22.2 ARM binary zip file appears to have two files in it, causing the following error during the update.

2021-02-17T14:25:53.336-0700        ERROR        Error updating service.        {"Service": "storagenode", "error": "archive should contain only one file", "errorVerbose": "archive should contain only one file\n\tmain.unpackBinary:94\n\tmain.downloadBinary:61\n\tmain.update:44\n\tmain.loopFunc:
26\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tmain.cmdRun:126\n\tstorj.io/private/process.cleanup.func1.4:363\n\tstorj.io/private/process.cleanup.func1:381\n\tgithub.com/spf13/cobra.(*Command).execute:842\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:950\n\tgithub.com/spf13/cobra.(*Command).Execute:887\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/
private/process.Exec:65\n\tmain.main:11\n\truntime.main:204"}

The zip file for the AMD64 binary does not have this issue:

The __MACOSX folder is likely tripping up the updater service.

I’ve manually updated my ARM node for this version.

will.topping · February 18, 2021, 4:53pm

@littleskunk anyone know if there’s an installer yet? Sorry to sound like a broken record but ive finally got my internet connection at the new house and I’m ready to setup all my nodes again after running GE last year.

Also, if there’s no installer, should I setup my new production nodes using this? Is this production ready?

Any issues playing it safe and setting my new nodes up with docker? Will the new Multi-node Dashboard work with docker setup?

Cheers

fmoledina · February 18, 2021, 5:12pm

@will.topping, for what it’s worth, I’ve been using this since v1.15.3 and it’s been equally stable as the Docker nodes. I know a .deb or something is preferable, but the installation isn’t too onerous. As linked by @ACarneiro last month, I’ll pat myself on the back for the following instructions. A couple binary downloads, creating a dedicated user, and a couple systemd service files and it’ll be up and running. Perhaps something to hold you over until an installer is released.