Node continuously restarting; Error: rpc: dial tcp 127.0.0.1:7778: connect: connection refused; Running as root

I moved my node to a new machine and I made quite a few mistakes in the process. A few points about my environment follow:

  • I am running it on a Raspberry Pi inside an Argon EON Pi NAS, which fixed the HDD power problems that prompted me to get the new machine;
  • The identity was created on the old machine using my default user “michael” and not pi or root/sudo;
  • I changed ownership of my identity and storagenode/orders and storagenode/storage to root using chown (after failing to set it up using my “michael” user);
  • I removed and reinstalled docker and I did not add the “michael” user to the docker group (since the reinstall, I have been running docker commands using sudo).

My docker-compose.yml file is as follows:

version: '3.3'

services:
  storagenode:
    command: --operator.wallet-features='zksync'
    container_name: 'storagenode'
    environment:
      - WALLET='<redacted>'
      - EMAIL='<redacted>'
      - ADDRESS='<redacted>:28967'
      - STORAGE='11200GB'
    image: 'storjlabs/storagenode:latest'
    ports:
      - '28967:28967/tcp'
      - '28967:28967/udp'
      - '14002:14002/tcp'
    restart: 'always'
    stop_grace_period: '300s'
    volumes:
      - type: 'bind'
        source: '/mnt/storj/identity'
        target: '/app/identity'
      - type: 'bind'
        source: '/mnt/storj'
        target: '/app/config'

My storj directories are as follows:

/mnt/storj/identity
/mnt/storj/storagenode/orders
/mnt/storj/storagenode/storage

I tried adding the line user: '$(id -u)':'$(id -g)'" to my docker-compose.yml, which didn’t work, I guess, because I changed everything to root. That line was never in my compose file before. I would prefer to use root because it makes my backups more straightforward, but I can rethink my backup strategy if necessary to get my storj node back online.

I have been struggling with this for over a week and reviewing the forums for help. Could someone help me out directly?

Well, let’s take a look at the logs, I’ll need to see what it is saying as it starts up and not the last error…

Which logs? The output of sudo docker exec -it storagenode ls /var/log is as follows:

apt  btmp  dpkg.log  faillog  lastlog  supervisor  wtmp

Thanks

Try the below

docker logs --tail 20 storagenode

2023-09-08 00:31:32,569 INFO supervisord started with pid 1
2023-09-08 00:31:33,573 INFO spawned: 'processes-exit-eventlistener' with pid 11
2023-09-08 00:31:33,578 INFO spawned: 'storagenode' with pid 12
2023-09-08 00:31:33,583 INFO spawned: 'storagenode-updater' with pid 13
2023-09-08T00:31:33Z	INFO	Anonymized tracing enabled	{"Process": "storagenode-updater"}
2023-09-08T00:31:33Z	INFO	Running on version	{"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.86.1"}
2023-09-08T00:31:33Z	INFO	Downloading versions.	{"Process": "storagenode-updater", "Server Address": "https://version.storj.io"}
2023-09-08T00:31:33Z	INFO	Anonymized tracing enabled	{"process": "storagenode"}
2023-09-08T00:31:33Z	INFO	Operator email	{"process": "storagenode", "Address": "michaelgill1969@gmail.com"}
2023-09-08T00:31:33Z	INFO	Operator wallet	{"process": "storagenode", "Address": "0x29053f0779A10C28A9CdFF38C7AA55733C593Efc"}
Error: Error starting master database on storagenode: group:
--- stat config/storage/blobs: no such file or directory
--- stat config/storage/temp: no such file or directory
--- stat config/storage/garbage: no such file or directory
--- stat config/storage/trash: no such file or directory
2023-09-08 00:31:33,699 INFO exited: storagenode (exit status 1; not expected)
2023-09-08T00:31:33Z	INFO	Current binary version	{"Process": "storagenode-updater", "Service": "storagenode", "Version": "v1.86.1"}
2023-09-08T00:31:33Z	INFO	Version is up to date	{"Process": "storagenode-updater", "Service": "storagenode"}
2023-09-08T00:31:33Z	INFO	Current binary version	{"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.86.1"}
2023-09-08T00:31:33Z	INFO	Version is up to date	{"Process": "storagenode-updater", "Service": "storagenode-updater"}

Is it the file permissions?

The output of ls -ahl /mnt/storj/storagenode is as follows:

total 60K
drwxr-xr-x 4 root root 4.0K Sep  5 18:00 .
drwxr-xr-x 5 root root 4.0K Sep  7 14:43 ..
-rw------- 1 root root 9.2K Sep  4 15:51 config.yaml
drwx------ 4 root root 4.0K Jul 19  2022 orders
-rw------- 1 root root  32K Aug 30 16:08 revocations.db
drwx------ 6 root root 4.0K Sep  4 16:48 storage

The output of sudo ls -ahl /mnt/storj/storagenode/storage is as follows:

total 52M
drwx------ 6 root root 4.0K Sep  4 16:48 .
drwxr-xr-x 4 root root 4.0K Sep  5 18:00 ..
-rw-r--r-- 1 root root  38M Aug 31 18:08 bandwidth.db
drwx------ 8 root root 4.0K Jul 20  2022 blobs
-rw-r--r-- 1 root root 485K Oct 25  2022 dump_all_notrans.sql
-rw-r--r-- 1 root root 485K Oct 25  2022 dump_all.sql
drwx------ 2 root root 4.0K Aug 31 18:08 garbage
-rw-r--r-- 1 root root  64K Aug 31 16:38 heldamount.db
-rw-r--r-- 1 root root  16K Aug 30 16:38 info.db
-rw-r--r-- 1 root root  24K Aug 30 16:38 notifications.db
-rw-r--r-- 1 root root  32K Aug 30 16:38 orders.db
-rw-r--r-- 1 root root  32K Aug 31 17:53 orders.db-shm
-rw-r--r-- 1 root root    0 Aug 31 17:53 orders.db-wal
-rw-r--r-- 1 root root  12M Aug 31 17:38 piece_expiration.db
-rw-r--r-- 1 root root  32K Aug 31 18:08 piece_expiration.db-shm
-rw-r--r-- 1 root root  65K Aug 31 18:08 piece_expiration.db-wal
-rw-r--r-- 1 root root  24K Aug 30 16:38 pieceinfo.db
-rw-r--r-- 1 root root  32K Aug 31 18:08 pieceinfo.db-shm
-rw-r--r-- 1 root root    0 Aug 31 18:08 pieceinfo.db-wal
-rw-r--r-- 1 root root  24K Aug 30 16:38 piece_spaced_used.db
-rw-r--r-- 1 root root  24K Aug 30 16:38 pricing.db
-rw-r--r-- 1 root root  36K Aug 31 16:43 reputation.db
-rw-r--r-- 1 root root  32K Aug 30 17:08 satellites.db
-rw-r--r-- 1 root root  32K Aug 31 17:57 satellites.db-shm
-rw-r--r-- 1 root root    0 Aug 31 17:57 satellites.db-wal
-rw-r--r-- 1 root root  24K Aug 30 16:38 secret.db
-rw-r--r-- 1 root root   32 Jul 19  2022 storage-dir-verification
-rw-r--r-- 1 root root 496K Aug 31 16:38 storage_usage.db
drwx------ 2 root root  12K Aug 28 20:21 temp
drwx------ 8 root root 4.0K Jul 20  2022 trash
-rw-r--r-- 1 root root  20K Aug 30 16:38 used_serial.db

Does this explain Error: Error starting master database on storagenode: group: in the log above?

Yes, or where things are mapped in the config file are no longer correct.

The only sort of path I see in config.yml is as follows:

# path to the certificate chain for this identity
identity.cert-path: identity/identity.cert

# path to the private key for this identity
identity.key-path: identity/identity.key

I am trying to regenerate the config.yaml. First, I backed up the old one and removed it.

Then I ran the command as follows:

sudo docker run --rm -e SETUP="true" \
    --user $(id -u):$(id -g) \
    --mount type=bind,source="/mnt/storj/identity",destination=/app/identity \
    --mount type=bind,source="/mnt/storj",destination=/app/config \
    --name storagenode storjlabs/storagenode:latest

It fails with the last line Error: open /app/config/config.yaml1689024424: permission denied . It does not generate a new config.yaml. I recall this is the same as yesterday and I have been using an old config.yaml.

I still get the same log as follows:

--- stat config/storage/garbage: no such file or directory
--- stat config/storage/trash: no such file or directory
2023-09-08 01:36:13,900 INFO exited: storagenode (exit status 1; not expected)
2023-09-08T01:36:14Z    INFO    Current binary version  {"Process": "storagenode-updater", "Service": "storagenode", "Version": "v1.86.1"}
2023-09-08T01:36:14Z    INFO    Version is up to date   {"Process": "storagenode-updater", "Service": "storagenode"}
2023-09-08T01:36:14Z    INFO    Current binary version  {"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.86.1"}
2023-09-08T01:36:14Z    INFO    Version is up to date   {"Process": "storagenode-updater", "Service": "storagenode-updater"}
2023-09-08 01:36:15,294 INFO success: processes-exit-eventlistener entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-09-08 01:36:15,301 INFO spawned: 'storagenode' with pid 85
2023-09-08 01:36:15,302 INFO success: storagenode-updater entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-09-08T01:36:15Z    INFO    Anonymized tracing enabled      {"process": "storagenode"}
2023-09-08T01:36:15Z    INFO    Operator email  {"process": "storagenode", "Address": "michaelgill1969@gmail.com"}
2023-09-08T01:36:15Z    INFO    Operator wallet {"process": "storagenode", "Address": "0x29053f0779A10C28A9CdFF38C7AA55733C593Efc"}
Error: Error starting master database on storagenode: group:
--- stat config/storage/blobs: no such file or directory
--- stat config/storage/temp: no such file or directory
--- stat config/storage/garbage: no such file or directory
--- stat config/storage/trash: no such file or directory
2023-09-08 01:36:15,475 INFO exited: storagenode (exit status 1; not expected)
2023-09-08 01:36:17,482 INFO spawned: 'storagenode' with pid 94

See here…

Hello @michaelgill1969,
Welcome to the forum!

Since you use this path for the data location:

then this section:

should be:

      - type: 'bind'
        source: '/mnt/storj/storagenode'
        target: '/app/config'

and there:

it should be

    --mount type=bind,source="/mnt/storj/storagenode",destination=/app/config \

However, since you already have some data, you must never run the SETUP step again for this node, this command should be executed only once for the same identity for its entire life, otherwise you may break it by specifying a wrong data/identity locations (like you did now) and it will be disqualified. This time you were lucky that your user doesn’t have permissions for /mnt/storj and that saved your node.

1 Like

Thanks for your help, Knowledge and Alexey. Unfortunately, I think the node would require " How to fix a ‘database disk image is malformed’", which is more than I want to get into. I’ll try to start from scratch with a new node.

This is not needed. You may either fix them:

or re-create:

In the latter case you will lose historic data and dashboard would show a wrong numbers for the current month.

1 Like