Issue with Storj: Docker Container Restarting in a Loop

Hello everyone,

I’m encountering an issue with my Storj node. My Docker container keeps restarting in a loop and I’m seeing errors in the logs. Here are the details of my configuration and the errors I’m encountering:

Docker Container Configuration

docker run -d --restart unless-stopped --stop-timeout 300 \
    -p 28967:28967/tcp \
    -p 28967:28967/udp \
    -p 127.0.0.1:14002:14002 \
    -e WALLET="0x9c6c8e7dc6b8bb264469abbddbfd985cfe33888e" \
    -e EMAIL="myemail@myemail.com" \
    -e ADDRESS="nelexstorj.ddns.net:28967" \
    -e STORAGE="4TB" \
    --user $(id -u):$(id -g) \
    --mount type=bind,source="/home/nelexstorj/.local/share/storj/identity/storagenode",destination=/app/identity \
    --mount type=bind,source="/mnt/disk1",destination=/app/config \
    --name storagenode storjlabs/storagenode:latest

Docker Container Logs

sudo docker logs --tail 50 storagenode
2024-05-26T11:53:38Z	ERROR	contact:service	ping satellite failed 	{"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "attempts": 1, "error": "ping satellite: rpc: tcp connector failed: rpc: dial tcp: lookup saltlake.tardigrade.io: operation was canceled", "errorVerbose": "ping satellite: rpc: tcp connector failed: rpc: dial tcp: lookup saltlake.tardigrade.io: operation was canceled\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190"}
2024-05-26T11:53:38Z	INFO	contact:service	context cancelled	{"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-05-26T11:53:38Z	ERROR	failure during run	{"Process": "storagenode", "error": "piecestore monitor: error verifying location and/or readability of storage directory: content of file is not a valid node ID: ", "errorVerbose": "piecestore monitor: error verifying location and/or readability of storage directory: content of file is not a valid node ID: \n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1.1:160\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:143\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
Error: piecestore monitor: error verifying location and/or readability of storage directory: content of file is not a valid node ID: 
2024-05-26 11:53:38,863 INFO exited: storagenode (exit status 1; not expected)
2024-05-26 11:53:39,867 INFO spawned: 'storagenode' with pid 46
2024-05-26 11:53:39,868 WARN received SIGQUIT indicating exit request
2024-05-26 11:53:39,868 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2024-05-26T11:53:39Z	INFO	Got a signal from the OS: "terminated"	{"Process": "storagenode-updater"}
2024-05-26 11:53:39,871 INFO stopped: storagenode-updater (exit status 0)
2024-05-26T11:53:39Z	INFO	Configuration loaded	{"Process": "storagenode", "Location": "/app/config/config.yaml"}
2024-05-26T11:53:39Z	INFO	Anonymized tracing enabled	{"Process": "storagenode"}
2024-05-26T11:53:39Z	INFO	Operator email	{"Process": "storagenode", "Address": "nelson.sobralrosa@gmail.com"}
2024-05-26T11:53:39Z	INFO	Operator wallet	{"Process": "storagenode", "Address": "0x9c6c8e7dc6b8bb264469abbddbfd985cfe33888e"}
2024-05-26T11:53:39Z	INFO	server	kernel support for server-side tcp fast open remains disabled.	{"Process": "storagenode"}
2024-05-26T11:53:39Z	INFO	server	enable with: sysctl -w net.ipv4.tcp_fastopen=3	{"Process": "storagenode"}
2024-05-26T11:53:40Z	INFO	Telemetry enabled	{"Process": "storagenode", "instance ID": "126aKMo19eBKkVhXnHmm34sax4F3M6CpqGJYNLSy4eJTNoVK39P"}
2024-05-26T11:53:40Z	INFO	Event collection enabled	{"Process": "storagenode", "instance ID": "126aKMo19eBKkVhXnHmm34sax4F3M6CpqGJYNLSy4eJTNoVK39P"}
2024-05-26T11:53:40Z	INFO	db.migration	Database Version	{"Process": "storagenode", "version": 57}
2024-05-26T11:53:40Z	INFO	preflight:localtime	start checking local system clock with trusted satellites' system clock.	{"Process": "storagenode"}
2024-05-26T11:53:41Z	INFO	preflight:localtime	local system clock is in sync with trusted satellites' system clock.	{"Process": "storagenode"}
2024-05-26T11:53:41Z	INFO	Node 126aKMo19eBKkVhXnHmm34sax4F3M6CpqGJYNLSy4eJTNoVK39P started	{"Process": "storagenode"}
2024-05-26T11:53:41Z	INFO	Public server started on [::]:28967	{"Process": "storagenode"}
2024-05-26T11:53:41Z	INFO	Private server started on 127.0.0.1:7778	{"Process": "storagenode"}
2024-05-26T11:53:41Z	INFO	bandwidth	Persisting bandwidth usage cache to db	{"Process": "storagenode"}
2024-05-26T11:53:41Z	INFO	failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.	{"Process": "storagenode"}
2024-05-26T11:53:41Z	INFO	pieces:trash	emptying trash started	{"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-05-26T11:53:41Z	INFO	lazyfilewalker.trash-cleanup-filewalker	starting subprocess	{"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-05-26T11:53:41Z	INFO	trust	Scheduling next refresh	{"Process": "storagenode", "after": "6h10m10.21397249s"}
2024-05-26T11:53:41Z	INFO	lazyfilewalker.trash-cleanup-filewalker	subprocess started	{"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-05-26T11:53:41Z	WARN	piecestore:monitor	Disk space is less than requested. Allocated space is	{"Process": "storagenode", "bytes": 3736761913344}
2024-05-26T11:53:41Z	ERROR	services	unexpected shutdown of a runner	{"Process": "storagenode", "name": "piecestore:monitor", "error": "piecestore monitor: error verifying location and/or readability of storage directory: content of file is not a valid node ID: ", "errorVerbose": "piecestore monitor: error verifying location and/or readability of storage directory: content of file is not a valid node ID: \n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1.1:160\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:143\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-05-26T11:53:41Z	ERROR	contact:service	ping satellite failed 	{"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "attempts": 1, "error": "ping satellite: rpc: tcp connector failed: rpc: dial tcp: lookup saltlake.tardigrade.io: operation was canceled", "errorVerbose": "ping satellite: rpc: tcp connector failed: rpc: dial tcp: lookup saltlake.tardigrade.io: operation was canceled\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190"}
2024-05-26T11:53:41Z	INFO	contact:service	context cancelled	{"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-05-26T11:53:41Z	ERROR	gracefulexit:chore	error retrieving satellites.	{"Process": "storagenode", "error": "satellitesdb: context canceled", "errorVerbose": "satellitesdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-05-26T11:53:41Z	ERROR	contact:service	ping satellite failed 	{"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "attempts": 1, "error": "ping satellite: rpc: tcp connector failed: rpc: dial tcp: lookup ap1.storj.io: operation was canceled", "errorVerbose": "ping satellite: rpc: tcp connector failed: rpc: dial tcp: lookup ap1.storj.io: operation was canceled\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190"}
2024-05-26T11:53:41Z	INFO	contact:service	context cancelled	{"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-05-26T11:53:41Z	ERROR	version	failed to get process version info	{"Process": "storagenode", "error": "version checker client: Get \"https://version.storj.io\": context canceled", "errorVerbose": "version checker client: Get \"https://version.storj.io\": context canceled\n\tstorj.io/storj/private/version/checker.(*Client).All:68\n\tstorj.io/storj/private/version/checker.(*Client).Process:89\n\tstorj.io/storj/private/version/checker.(*Service).checkVersion:104\n\tstorj.io/storj/private/version/checker.(*Service).CheckVersion:78\n\tstorj.io/storj/storagenode/version.(*Chore).checkVersion:115\n\tstorj.io/storj/storagenode/version.(*Chore).RunOnce:71\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/storj/storagenode/version.(*Chore).Run:64\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-05-26T11:53:41Z	ERROR	nodestats:cache	Get pricing-model/join date failed	{"Process": "storagenode", "error": "context canceled"}
2024-05-26T11:53:41Z	ERROR	contact:service	ping satellite failed 	{"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "attempts": 1, "error": "ping satellite: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled", "errorVerbose": "ping satellite: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190"}
2024-05-26T11:53:41Z	INFO	contact:service	context cancelled	{"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-05-26T11:53:41Z	ERROR	contact:service	ping satellite failed 	{"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "attempts": 1, "error": "ping satellite: rpc: tcp connector failed: rpc: dial tcp: lookup eu1.storj.io: operation was canceled", "errorVerbose": "ping satellite: rpc: tcp connector failed: rpc: dial tcp: lookup eu1.storj.io: operation was canceled\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190"}
2024-05-26T11:53:41Z	INFO	contact:service	context cancelled	{"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-05-26T11:53:41Z	INFO	lazyfilewalker.trash-cleanup-filewalker	subprocess exited with status	{"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "status": -1, "error": "signal: killed"}
2024-05-26T11:53:41Z	ERROR	pieces:trash	emptying trash failed	{"Process": "storagenode", "error": "pieces error: lazyfilewalker: signal: killed", "errorVerbose": "pieces error: lazyfilewalker: signal: killed\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:85\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkCleanupTrash:187\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:419\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1.1:84\n\tstorj.io/common/sync2.(*Workplace).Start.func1:89"}
2024-05-26T11:53:41Z	ERROR	gracefulexit:blobscleaner	couldn't receive satellite's GE status	{"Process": "storagenode", "error": "context canceled"}
2024-05-26T11:53:41Z	ERROR	failure during run	{"Process": "storagenode", "error": "piecestore monitor: error verifying location and/or readability of storage directory: content of file is not a valid node ID: ", "errorVerbose": "piecestore monitor: error verifying location and/or readability of storage directory: content of file is not a valid node ID: \n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1.1:160\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:143\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
Error: piecestore monitor: error verifying location and/or readability of storage directory: content of file is not a valid node ID: 
2024-05-26 11:53:41,165 INFO stopped: storagenode (exit status 1)
2024-05-26 11:53:41,166 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)

Identity directory

nelexstorj@ubuntu:/mnt/disk1$ ls -l /home/nelexstorj/.local/share/storj/identity/storagenode
total 24
-rwx------ 1 nelexstorj nelexstorj  558 May 26 12:22 ca.1716718937.cert
-rwx------ 1 nelexstorj nelexstorj 1088 May 26 12:22 ca.cert
-rwx------ 1 nelexstorj nelexstorj  241 May 26 00:12 ca.key
-rwx------ 1 nelexstorj nelexstorj 1096 May 26 12:22 identity.1716718937.cert
-rwx------ 1 nelexstorj nelexstorj 1626 May 26 12:22 identity.cert
-rwx------ 1 nelexstorj nelexstorj  241 May 26 00:12 identity.key

Storage disk

nelexstorj@ubuntu:/mnt/disk1$ ls -l /mnt/disk1
total 104
-rwx------ 1 nelexstorj nelexstorj   558 May 24 13:31 ca.1716539445.cert
-rwx------ 1 nelexstorj nelexstorj  1088 May 24 13:31 ca.cert
drwx------ 2 nelexstorj nelexstorj  4096 May 24 13:26 config
-rwx------ 1 nelexstorj nelexstorj 11645 May 24 21:57 config.yaml
drwx------ 3 nelexstorj nelexstorj  4096 May 24 13:31 identity
-rwx------ 1 nelexstorj nelexstorj  1096 May 24 13:31 identity.1716539445.cert
-rwx------ 1 nelexstorj nelexstorj  1626 May 24 13:31 identity.cert
-rwx------ 1 nelexstorj nelexstorj   241 May 24 13:31 identity.key
drwx------ 2 nelexstorj nelexstorj  4096 May 24 14:23 identityfolder
drwx------ 2 nelexstorj nelexstorj 16384 May 24 10:46 lost+found
drwx------ 4 nelexstorj nelexstorj  4096 May 24 16:27 orders
drwx------ 2 nelexstorj nelexstorj  4096 May 24 16:27 retain
-rwx------ 1 nelexstorj nelexstorj 32768 May 26 14:43 revocations.db
-rwx------ 1 nelexstorj nelexstorj   487 May 24 16:25 start-storagenode.sh
drwx------ 5 nelexstorj nelexstorj  4096 May 26 14:43 storage
-rw------- 1 nelexstorj nelexstorj   933 May 26 14:43 trust-cache.json

Mapping

sda           8:0    0   3.6T  0 disk 
└─sda1        8:1    0   3.6T  0 part /mnt/disk1

Docker list

CONTAINER ID   IMAGE                          COMMAND         CREATED             STATUS                          PORTS     NAMES
4568e3031605   storjlabs/storagenode:latest   "/entrypoint"   About an hour ago   Restarting (0) 31 seconds ago             storagenode

Hardware and linux version

Rasperry pi5
Disk 4Tb USB
ubuntu 23.10

Any idea what could be wrong ?

Cheers

You are trying to run with a wrong identity. Please check the path to the identity.

Did you manually fiddle with the Identity file permissions? Mine were created with the standard commands and look like this:

-rw-r--r--  ca.cert
-rw-r--r--  identity.cert
-rw-------  identity.key

Probably nothing to do with it: but worth mentioning.

… and there are Identity files in two places?? I see them in the “/home/nelexstorj/.local/share/storj/identity/storagenode” path in the command… but also in the “/mnt/disk1” output?

Is this a node being migrated from someplace else?

Actually… that’s not a standard directory either: perhaps even more Identity files in there?

And the permissions in /mnt/disk1 aren’t standard: like isn’t retain normally drwxr-xr-x ?

I guess you need to replace that mount:

to

    --mount type=bind,source="/mnt/disk1",destination=/app/identity \

Since you have your identity’s files right in the config directory:

Or, perhaps

    --mount type=bind,source="/mnt/disk1/identity",destination=/app/identity \

because you have there such a folder:

or maybe

    --mount type=bind,source="/mnt/disk1/identityfolder",destination=/app/identity \

because you have this folder too:

Why so many identities? And which is a correct one?

Hi, thank you for your help. The issue was related to an incorrect path. When it still didn’t work, I tried playing with the permissions and ended up messing up the system. So, I reinstalled everything and recreated the node from scratch. Everything is working now, and I’m online.
I also used this website in parallel of the official tutorial to help me to setup the node
https://insights.ditatompel.com/en/tutorials/how-to-host-a-storj-node/