New node restarts in a loop

I just setup a new node and it’s restarting after 11 seconds consistently.
Running sudo docker exec -it storagenode /app/dashboard.sh shows this error:

Error: rpc: dial tcp 127.0.0.1:7778: connect: connection refused

Last 20 lines from the logs using sudo docker logs --tail 20 storagenode

--- stat config/storage/temp: no such file or directory
--- stat config/storage/garbage: no such file or directory
--- stat config/storage/trash: no such file or directory
2023-11-30 02:04:33,262 INFO exited: storagenode (exit status 1; not expected)
2023-11-30 02:04:36,270 INFO spawned: 'storagenode' with pid 59
2023-11-30T02:04:36Z    INFO    Anonymized tracing enabled      {"process": "storagenode"}
2023-11-30T02:04:36Z    INFO    Operator email  {"process": "storagenode", "Address": "EMAIL"}
2023-11-30T02:04:36Z    INFO    Operator wallet {"process": "storagenode", "Address": "0xXXXXXXX"}
Error: Error starting master database on storagenode: group:
--- stat config/storage/blobs: no such file or directory
--- stat config/storage/temp: no such file or directory
--- stat config/storage/garbage: no such file or directory
--- stat config/storage/trash: no such file or directory
2023-11-30 02:04:36,367 INFO exited: storagenode (exit status 1; not expected)
2023-11-30 02:04:37,368 INFO gave up: storagenode entered FATAL state, too many start retries too quickly
2023-11-30 02:04:38,371 WARN received SIGQUIT indicating exit request
2023-11-30 02:04:38,372 INFO waiting for processes-exit-eventlistener, storagenode-updater to die
2023-11-30T02:04:38Z    INFO    Got a signal from the OS: "terminated"  {"Process": "storagenode-updater"}
2023-11-30 02:04:38,380 INFO stopped: storagenode-updater (exit status 0)
2023-11-30 02:04:39,385 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)

Are the mounts correct? Seems like the files cannot be found?

1 Like

My drive is mounted and auto mounted at boot.
As for the error, I don’t even know where that would be located at.

They should be on the disk with data, if the mount point is correct.

Please configure a static mount instead: How do I setup static mount via /etc/fstab for Linux? - Storj Docs

You’re sure the drive is still mounted? It’s not an accidentally disconnected USB drive for example?

Just restarted the whole operating system, preferably cold? Because experience teached is, many problems are solved by just a powerdown/powerup.

1 Like

This is what I meant, it’s already configured to be a static mounted. I don’t see any data related to storj at all really, I thought maybe it would be a compatibility issue. Does storj work on Debian 12 (Bookworm)? I see in the setup it says to use Debian 9 or 10.

Yeah I’m sure. I even checked using lsblk and df -h.

Please show your docker run command. You may mask a personal info, but copy all other (like --mount options) without obfuscation.

sudo docker run -d --restart unless-stopped --stop-timeout 300
-p 28967:28967/tcp
-p 28967:28967/udp
-p 14002:14002
-e WALLET=“0x—”
-e EMAIL=“email”
-e ADDRESS=“NOIPaddress:28967”
-e STORAGE=“14TB”
–user $(id -u):$(id -g)
–mount type=bind,source=“/mnt/drive1/identity”,destination=/app/identity
–mount type=bind,source=“/mnt/drive1/data”,destination=/app/config
–name storagenode storjlabs/storagenode:latest --operator.wallet-features=zksync-era,zksync

Please show result of the command:

ls -l /mnt/drive1/data
ls -l /mnt/drive1/data/storage

also

df -T --si

Yes, not a problem at all.

Your situation looks like wrong mount or data loss of any kind.

Already done that cold reboot?

So I fixed it by changing the local ip of the rpi device. It seems that maybe it was conflicting with the old node I had before (don’t have it anymore). I’m not sure why that would have conflicted if it didn’t exist anymore.

@JWvdV So you are aware too.

This is very unlikely. The error was about missing folders, not unavailability on the network level. You likely did something else, like run a SETUP step in the second time. You shouldn’t run it more than once for the entire node’s life, otherwise you may destroy it.
I would hope, that this is a new node which was never run before, and you provided a correct path for data otherwise. But in the latter case there is still a question - why these folders were missing. This again points to think about wrongly provided path for data.

I’m not entirely sure because I was careful and did everything correctly. I ran the setup once, provided correct paths, I double checked everything I was running/typing, etc.

When I changed the rpi’s up, I also reinstalled the OS and redid the entire process but kept the verified identity files (from a backup). I did the EXACT same thing throughout this setup process as before. The only differences is that the device ip is different and the NOIP address is also different.

So you have executed it with SETUP=true again?
I insist on showing results of these commands until it is not too late:

I did NOT execute the SETUP command again, I erased and installed debian on the primary drive again and redid the entire step by step guide again (Quickstart Node Setup - Storj Docs).

But as you asked, here are the following:

ls -l /mnt/drive1/data

total 48
-rw------- 1 pi pi 10775 Nov 30 07:10 config.yaml
drwx------ 4 pi pi  4096 Nov 30 07:11 orders
-rw------- 1 pi pi 32768 Nov 30 07:11 revocations.db
drwx------ 6 pi pi  4096 Nov 30 08:01 storage
-rw------- 1 pi pi   933 Nov 30 07:11 trust-cache.json
ls -l /mnt/drive1/data/storage

total 8156
-rw-r--r-- 1 pi pi  856064 Nov 30 08:03 bandwidth.db
-rw-r--r-- 1 pi pi   32768 Nov 30 08:04 bandwidth.db-shm
-rw-r--r-- 1 pi pi 4181832 Nov 30 08:04 bandwidth.db-wal
drwx------ 5 pi pi    4096 Nov 30 07:12 blobs
drwx------ 2 pi pi    4096 Nov 30 07:10 garbage
-rw-r--r-- 1 pi pi   32768 Nov 30 07:41 heldamount.db
-rw-r--r-- 1 pi pi   32768 Nov 30 08:00 heldamount.db-shm
-rw-r--r-- 1 pi pi       0 Nov 30 08:00 heldamount.db-wal
-rw-r--r-- 1 pi pi   16384 Nov 30 07:41 info.db
-rw-r--r-- 1 pi pi   24576 Nov 30 07:41 notifications.db
-rw-r--r-- 1 pi pi   32768 Nov 30 08:00 notifications.db-shm
-rw-r--r-- 1 pi pi       0 Nov 30 08:00 notifications.db-wal
-rw-r--r-- 1 pi pi   32768 Nov 30 07:41 orders.db
-rw-r--r-- 1 pi pi   32768 Nov 30 07:41 orders.db-shm
-rw-r--r-- 1 pi pi       0 Nov 30 07:41 orders.db-wal
-rw-r--r-- 1 pi pi   69632 Nov 30 07:41 piece_expiration.db
-rw-r--r-- 1 pi pi   32768 Nov 30 08:04 piece_expiration.db-shm
-rw-r--r-- 1 pi pi 2607992 Nov 30 08:04 piece_expiration.db-wal
-rw-r--r-- 1 pi pi   24576 Nov 30 07:41 pieceinfo.db
-rw-r--r-- 1 pi pi   24576 Nov 30 07:41 piece_spaced_used.db
-rw-r--r-- 1 pi pi   24576 Nov 30 07:41 pricing.db
-rw-r--r-- 1 pi pi   32768 Nov 30 08:00 pricing.db-shm
-rw-r--r-- 1 pi pi       0 Nov 30 08:00 pricing.db-wal
-rw-r--r-- 1 pi pi   24576 Nov 30 07:41 reputation.db
-rw-r--r-- 1 pi pi   32768 Nov 30 08:00 reputation.db-shm
-rw-r--r-- 1 pi pi       0 Nov 30 08:00 reputation.db-wal
-rw-r--r-- 1 pi pi   32768 Nov 30 07:41 satellites.db
-rw-r--r-- 1 pi pi   32768 Nov 30 07:41 satellites.db-shm
-rw-r--r-- 1 pi pi       0 Nov 30 07:41 satellites.db-wal
-rw-r--r-- 1 pi pi   24576 Nov 30 07:41 secret.db
-rw-r--r-- 1 pi pi      32 Nov 30 07:10 storage-dir-verification
-rw-r--r-- 1 pi pi   24576 Nov 30 07:41 storage_usage.db
-rw-r--r-- 1 pi pi   32768 Nov 30 08:00 storage_usage.db-shm
-rw-r--r-- 1 pi pi       0 Nov 30 08:00 storage_usage.db-wal
drwx------ 2 pi pi    4096 Nov 30 08:04 temp
drwx------ 2 pi pi    4096 Nov 30 07:10 trash
-rw-r--r-- 1 pi pi   20480 Nov 30 07:41 used_serial.db
df -T --si

Filesystem     Type      Size  Used Avail Use% Mounted on
udev           devtmpfs  3.9G     0  3.9G   0% /dev
tmpfs          tmpfs     819M  1.4M  818M   1% /run
/dev/sda2      ext4      492G  2.7G  465G   1% /
tmpfs          tmpfs     4.1G     0  4.1G   0% /dev/shm
tmpfs          tmpfs     5.3M   17k  5.3M   1% /run/lock
/dev/sda1      vfat      535M   64M  471M  12% /boot/firmware
tmpfs          tmpfs     819M     0  819M   0% /run/user/1000
/dev/sdb1      ext4       16T  2.8G   16T   1% /mnt/drive1
1 Like

Looks correct. Then this error is weird,

it must not happen with this path.

This is relieve. Then your node should be fine.

Yes, I am happy it is up and working, though I am upset that I made a silly mistake and had to start all over.

See, I needed to reconfigure NOIP due to an Internet switch but I was tired and accidentally mistyped and deleted the root of my system with sudo rm -r ~/*. I lost everything.

What can be backed up and restored in the even of any future failure? Identity files? An entire system image?

rm -r ~/* doesn’t delete the root file system, although it deletes the home of the root.
As far as I could see, you shouldn’t be harmed. Because your mounted it at /mnt-subvolume.

My mistake, home not root. The problem was that my identity files were also in that directory when I had that node and I didn’t have a backup of them.

This time around I put the identity files on the actual drive and backed them up to an external drive+cloud.