Failed to start new storagenode with migrated DB

karacurt · September 25, 2024, 8:17am

Hi!
Can’t start a new node, because of databases missed.
I’m moving databases to a different SSD and it’s empty in a directory.

I’ve made a following:

in config.yaml:

storage2.database-dir: "dbs"

in primary setup command:

--user $(id -u):$(id -g) \
--mount type=bind,source="/var/storj/storagenode4/identity",destination=/app/identity \
--mount type=bind,source="/var/storj/storagenode4/data",destination=/app/config \
--name storagenode4 storjlabs/storagenode:latest

in startup command:

docker run -d --restart unless-stopped --stop-timeout 300 \
-p 28967:28972/tcp \
-p 28967:28972/udp \
-p 14006:14002 \
-e WALLET="XXX" \
-e EMAIL="YYY@YYY.YYY" \
-e ADDRESS="ZZZ:28972" \
-e STORAGE="10TB" \
--user $(id -u):$(id -g) \
--mount type=bind,source="/var/storj/storagenode4/identity",destination=/app/identity \
--mount type=bind,source="/var/storj/storagenode4/data",destination=/app/config \
--mount type=bind,source="/var/storj/db/storagenode4",destination=/app/dbs \
--log-opt max-size=50m \
--log-opt max-file=10 \
--name storagenode4 storjlabs/storagenode:latest

the Mount Point are:

sdb       8:16   1  10.9T  0 disk /var/storj/storagenode4
nvme0n1 259:0    0 119.2G  0 disk /var/storj/db

all the Mount Point on /var/storj - this directory created on a system disk.

getting the following output from logs:

INFO    Configuration loaded    {"Process": "storagenode", "Location": "/app/config/config.yaml"}
INFO    Anonymized tracing enabled      {"Process": "storagenode"}
INFO    Operator email  {"Process": "storagenode", "Address": "YYY@YYY.YYY"}
INFO    Operator wallet {"Process": "storagenode", "Address": "XXX"}
INFO    db      database does not exist {"Process": "storagenode", "database": "info"}
INFO    db      database does not exist {"Process": "storagenode", "database": "bandwidth"}
INFO    db      database does not exist {"Process": "storagenode", "database": "orders"}
INFO    db      database does not exist {"Process": "storagenode", "database": "piece_expiration"}
INFO    db      database does not exist {"Process": "storagenode", "database": "pieceinfo"}
INFO    db      database does not exist {"Process": "storagenode", "database": "piece_spaced_used"}
INFO    db      database does not exist {"Process": "storagenode", "database": "reputation"}
INFO    db      database does not exist {"Process": "storagenode", "database": "storage_usage"}
INFO    db      database does not exist {"Process": "storagenode", "database": "used_serial"}
INFO    db      database does not exist {"Process": "storagenode", "database": "satellites"}
INFO    db      database does not exist {"Process": "storagenode", "database": "notifications"}
INFO    db      database does not exist {"Process": "storagenode", "database": "heldamount"}
INFO    db      database does not exist {"Process": "storagenode", "database": "pricing"}
INFO    db      database does not exist {"Process": "storagenode", "database": "secret"}
INFO    db      database does not exist {"Process": "storagenode", "database": "garbage_collection_filewalker_progress"}
INFO    db      database does not exist {"Process": "storagenode", "database": "used_space_per_prefix"}
INFO    server  kernel support for server-side tcp fast open remains disabled.  {"Process": "storagenode"}
INFO    server  enable with: sysctl -w net.ipv4.tcp_fastopen=3  {"Process": "storagenode"}
INFO    Telemetry enabled       {"Process": "storagenode", "instance ID": "TTT"}
INFO    Event collection enabled        {"Process": "storagenode", "instance ID": "TTT"}
ERROR   failure during run      {"Process": "storagenode", "error": "Error migrating tables for database on storagenode: migrate: database: info opening fil
FATAL   Unrecoverable error     {"Process": "storagenode", "error": "Error migrating tables for database on storagenode: migrate: database: info opening fil

What I have made wrong? Please help.

Alexey · September 25, 2024, 8:42am

Please make sure, that you mounted the drive with a mandatory exec permissions option.
Please also make sure to add an exec permission to the bin subfolder in the data location.
And, of course, that $(id -u):$(id -g) is an owner of the storage location and has at least rw permissions for the data location, and rwx permission for the bin subfolder.

karacurt · September 25, 2024, 8:53am

Alexey please point me where can I read about it.
If you talking about changing in /etc/fstab there is following:

# /dev/sdb1
UUID=XXX /               ext4    errors=remount-ro 0       1
# swap was on /dev/sdc5 during installation
UUID=ZZZ none            swap    sw              0       0
# /dev/sda \STORJ\storagenode4
UUID="YYY" /var/storj/storagenode4 ext4 errors=remount-ro 0 2
# /dev/nvme0n1 \STORJ\DataBases
UUID="TTT" /var/storj/db ext4 errors=remount-ro 0 2

Also checked the ownership and permissions of db folder
it goes with root:root instead of user:docker.
Made changes with:

chown -R -c user:docker /var/storj	
chmod -R -c u+rwx /var/storj

And tried to new mkdir in /var/storj - the new one goes with root:root also.
So every newly created directory or file gets root:root despite of chmod and chown.
Don’t know why it works like this for now.

Alexey · September 25, 2024, 8:56am

You need to have either defaults, or have an exec permission explicitly in the list of options.

seems you run it with sudo, I would assume?

However, granting +x to the bin subfolder should solve the issue.

Alexey · September 25, 2024, 8:58am

So, there you need to add also rw or exec at least.

# /dev/sda \STORJ\storagenode4
UUID="YYY" /var/storj/storagenode4 ext4 rw,exec,errors=remount-ro 0 2

karacurt · September 25, 2024, 9:05am

yes

chmod -R -c +x /bin?

without any additions it looks like

ls -l /bin
lrwxrwxrwx 1 root root 7 Jul  4 04:04 /bin -> usr/bin

for now

looks like every new directory or file created after

sudo chown -R -c user:docker /var/storj
sudo chmod -R -c u+rwx /var/storj

in /var/storj grants root:root despite of these commands:(

karacurt · September 25, 2024, 12:42pm

added missed options:

# /dev/sdb1
UUID=YYY /               ext4    errors=remount-ro 0       1
# /dev/sda \STORJ\storagenode4
UUID="XXX" /var/storj/storagenode4 ext4 rw,exec,errors=remount-ro 0 2
# /dev/nvme0n1 \STORJ\DataBases
UUID="ZZZ" /var/storj/db ext4 rw,exec,errors=remount-ro 0 2

but still getting root:root on new directories

Alexey · September 26, 2024, 7:06am

No, it should be

chmod -R -c +x /var/storj/storagenode4/bin

You will, because

To do not get root:root, you should configure your docker to run without sudo, replace the owner to $(id -u):$(id -g) recursively for the data location and add --user $(id -u):$(id -g) to you docker run command before the image name.

This wasn’t necessary, only the data location needs it.
And by the way, I checked on Ubuntu 20.04, the rw permissions option includes exec by default, so actually you shouldn’t have an issue, only if the data location didn’t have an x permission, so perhaps only adding it should be enough.

karacurt · September 26, 2024, 9:18am

After all the changes the node elevated by the docker without resets
As I’ve mentioned before that I’ve mounted storagenode to var/storj directory,
that been created on system disk.
In iddle state I’m getting:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1       109G  5.1G   98G   5% /

but when I’m running docker run ... --name storagenode4 storjlabs/storagenode:latest
I’ve noticed the following:

Looks like for me that mounted drive tries to expand on system disk.
Am I right?
If so how can I make the mount directory for all the nodes?

karacurt · September 26, 2024, 9:24am

there is alot of net settings errors in the log:

ERROR   contact:service ping satellite failed   {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "attempts": 1, "error": "ping satellite: failed to ping stora
ERROR   contact:service ping satellite failed   {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "attempts": 1, "error": "ping satellite: failed to ping stora>
ERROR   contact:service ping satellite failed   {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "attempts": 1, "error": "ping satellite: failed to ping storag>
ERROR   contact:service ping satellite failed   {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "attempts": 1, "error": "ping satellite: failed to ping stora>
ERROR   contact:service ping satellite failed   {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "attempts": 2, "error": "ping satellite: failed to ping stora>
ERROR   contact:service ping satellite failed   {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "attempts": 2, "error": "ping satellite: failed to ping stora>
ERROR   contact:service ping satellite failed   {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "attempts": 2, "error": "ping satellite: failed to ping storag>
ERROR   contact:service ping satellite failed   {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "attempts": 2, "error": "ping satellite: check-in ratelimit: >
ERROR   contact:service ping satellite failed   {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "attempts": 3, "error": "ping satellite: failed to ping stora>
ERROR   contact:service ping satellite failed   {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "attempts": 3, "error": "ping satellite: failed to ping stora>
ERROR   contact:service ping satellite failed   {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "attempts": 3, "error": "ping satellite: failed to ping storag>
ERROR   contact:service ping satellite failed   {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "attempts": 3, "error": "ping satellite: check-in ratelimit: >
ERROR   contact:service ping satellite failed   {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "attempts": 4, "error": "ping satellite: check-in ratelimit: >

But I’ve checked twice router’s port forwarding parameters, they exactly as for windows nodes that work:

And linux firewall rules:

14006                      ALLOW IN    192.168.1.XX
28972                      ALLOW IN    Anywhere

Have no clue where to go.

Alexey · September 27, 2024, 5:42am

It’s easy to check:

df --si -T

If your disk is dismounted from the mount point, then the node would use the system disk, however, it will crash due to absence of the protection file, which is used exactly for this case.

But I think the reason is more simple - if you didn’t redirect logs to the file, you use a docker local logs driver, they are stored on the system disk by default.

meaning that your node advertising a wrong or unavailable external address and port. You may also check the exact reason:

docker logs storagenode4 2>&1 | grep "ping satellite failed" | grep -v "rate" | tail

if you redirected logs to the file, then you need to cat it:

cat /var/storj/storagenode4/storagenode.log | grep "ping satellite failed" | grep -v "rate" | tail

karacurt · September 27, 2024, 6:08am

looks like I’ve mistaken in startup command - I’ve put

-p 28967:28972/tcp \
-p 28967:28972/udp \

so the node was knocked in 28967:)

looks like it works (somehow), but got another problem - I can’t reach to the web dashboard via LAN:
2024-09-27_10-05-00

command line:
-p 14006:14002

linux firewall:
14006 ALLOW IN 192.168.1.XX

karacurt · September 27, 2024, 6:48am

Alexey:

It’s easy to check:
df --si -T
If your disk is dismounted from the mount point, then the node would use the system disk, however, it will crash due to absence of the protection file, which is used exactly for this case.

the disk is mounted to
/dev/sda1 2.7T 1.1T 1.7T 38% /var/storj
2024-09-23_13-30-55 (2)

but it siezed by volume somehow.
I’ve put 12Tb disk, it’s initialized as 10,9Tb in the beginning.
After the node had started it became 2,7Tb somehow.
2024-09-27_10-49-20 (2)
… looks like the system made partition somehow, but I didn’t do anything for that.
I’ve made tune2fs -m0 and the disk became loop with no partition.

Alexey · September 27, 2024, 6:56am

On the left side of the colon should be a host’s port, on the right side - the container’s port. For storagenode with a default config.yaml the container’s port would be 28967.
Of course, -p would work only if you use a docker default bridge and do not use the --network host option, otherwise you need to provide an unique port for each node on that host, see How to add an additional drive? - Storj Docs

Then you need to use not the 127.0.0.1 address, but the IP address of the host, if you are checking it from the another host in the LAN. Or use How to remote access the web dashboard - Storj Docs

Where? On the dashboard? If the database is not updated with the actual usage, it may believe, that it has only this amount of free space, thus it will automatically reduce the accepted allocation within the calculated free space.
This is by the way, may suggest, that you forgot to specify the container’s path to the databases with the option

      --storage2.database-dir string                             directory to store databases. if empty, uses data path

and the databases were re-created

karacurt · September 27, 2024, 7:15am

no it’s correct and works:

-p 28972:28967/tcp \
-p 28972:28967/udp \

I’ve confused in a rush, so it works, thank you:

karacurt · September 27, 2024, 7:25am

it’s not actual, please forget and forgive me for the inconvenience.

karacurt · September 27, 2024, 7:29am

Alexey:

This is by the way, may suggest, that you forgot to specify the container’s path to the databases with the option
      --storage2.database-dir string                             directory to store databases. if empty, uses data path
and the databases were re-created

As far as I can see this settings was done:

nano /var/storj/storagenode4/data/config.yaml
	
		log.output: "/app/config/node4.log"
		
		storage2.database-dir: "dbs"

--mount type=bind,source="/var/storj/db/storagenode4",destination=/app/dbs \

Alexey · September 27, 2024, 7:31am

Then it should be ok now.

karacurt · September 27, 2024, 7:34am

According to the dashboard looks like the node is running, all the user privileges rolled up and works fine.

Alexey, thank you for a guidance and assistance - it’s a priceless help!
As usual

karacurt · October 13, 2024, 6:28am

In a running process of this node I’ve got the following error:

ERROR   Error updating service. {"Process": "storagenode-updater", "Service": "storagenode-updater", "error": "context canceled", "errorVerbose": "context canceled\n\tmain.downloadBinary:58\n\tmain.update:40\n\tmain.loopFunc:31\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tmain.cmdRun:138\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCustomConfigAndLogger:77\n\tmain.main:22\n\truntime.main:271"}

Should I elevate watchtower or it not works as you’ve mentioned in previous topics and I should dig something else?
I’m not installed a watchtower due that.