Node crashes after restart

toma · June 2, 2024, 8:55am

2024-06-02T08:53:55Z	INFO	lazyfilewalker.used-space-filewalker	starting subprocess	{"Process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-06-02T08:53:55Z	ERROR	lazyfilewalker.used-space-filewalker	failed to start subprocess	{"Process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "error": "context canceled"}
2024-06-02T08:53:55Z	ERROR	pieces	failed to lazywalk space used by satellite	{"Process": "storagenode", "error": "lazyfilewalker: context canceled", "errorVerbose": "lazyfilewalker: context canceled\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:73\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:130\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:704\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-06-02T08:53:55Z	INFO	lazyfilewalker.used-space-filewalker	starting subprocess	{"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-06-02T08:53:55Z	ERROR	lazyfilewalker.used-space-filewalker	failed to start subprocess	{"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "error": "context canceled"}
2024-06-02T08:53:55Z	ERROR	pieces	failed to lazywalk space used by satellite	{"Process": "storagenode", "error": "lazyfilewalker: context canceled", "errorVerbose": "lazyfilewalker: context canceled\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:73\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:130\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:704\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-06-02T08:53:55Z	ERROR	piecestore:cache	error getting current used space: 	{"Process": "storagenode", "error": "filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled", "errorVerbose": "group:\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-02T08:53:55Z	ERROR	failure during run	{"Process": "storagenode", "error": "piecestore monitor: error verifying location and/or readability of storage directory: open config/storage/storage-dir-verification: no such file or directory", "errorVerbose": "piecestore monitor: error verifying location and/or readability of storage directory: open config/storage/storage-dir-verification: no such file or directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1.1:160\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:143\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
Error: piecestore monitor: error verifying location and/or readability of storage directory: open config/storage/storage-dir-verification: no such file or directory
2024-06-02 08:53:55,467 INFO stopped: storagenode (exit status 1)
2024-06-02 08:53:55,468 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)

toma · June 2, 2024, 9:09am

Filewalker came up with 3 types of errors:

2024-06-02T09:07:56Z	ERROR	collector	error during collecting pieces: 	{"Process": "storagenode", "error": "pieces error: v0pieceinfodb: context canceled", "errorVerbose": "pieces error: v0pieceinfodb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*v0PieceInfoDB).GetExpired:193\n\tstorj.io/storj/storagenode/pieces.(*Store).GetExpired:577\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:83\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:56\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/storj/storagenode/collector.(*Service).Run:52\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-02T09:06:34Z	INFO	pieces:trash	emptying trash started	{"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-06-02T09:06:34Z	ERROR	pieces:trash	emptying trash failed	{"Process": "storagenode", "error": "pieces error: lazyfilewalker: signal: killed", "errorVerbose": "pieces error: lazyfilewalker: signal: killed\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:85\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkCleanupTrash:187\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:419\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1.1:84\n\tstorj.io/common/sync2.(*Workplace).Start.func1:89"}

Alexey · June 2, 2024, 1:18pm

Did you remove it?
If not, make sure that you are provided a correct path to your data.
Of course you may disable this check by re-installing the node… but if it’s worked before - you need to figure out, why is it failed now.

toma · June 2, 2024, 1:35pm

The path is correct!

storagewars@raspberrypi:~ $ lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda           8:0    0  1.8T  0 disk 
└─sda1        8:1    0  1.8T  0 part /home/storagewars/Hds/HD1
sdb           8:16   0  1.8T  0 disk 
└─sdb1        8:17   0  1.8T  0 part /home/storagewars/Hds/HD2
mmcblk0     179:0    0 58.2G  0 disk 
├─mmcblk0p1 179:1    0  256M  0 part /boot
└─mmcblk0p2 179:2    0   58G  0 part /
storagewars@raspberrypi:~ $ sudo docker run -d --restart always --stop-timeout 300 -p 28969:28967/tcp -p 28969:28967/udp -p 127.0.0.1:14002:14002 -e WALLET="0x20ba0ed29b38f63cfe96193b1e85365821a7058a" -e EMAIL="unofficialolym@gmail.com" -e ADDRESS="storagewarz.ddns.net:28969" -e STORAGE="1.7TB" --memory=800m --log-opt max-size=50m --log-opt max-file=10 --mount type=bind,source=/home/storagewars/Hds/HD2/ID2/,destination=/app/identity --mount type=bind,source=/home/storagewars/Hds/HD2,destination=/app/config --name storagenode22 storjlabs/storagenode:latest --operator.wallet-features=zksync

storagewars@raspberrypi:~ $ ls -l /home/storagewars/Hds/HD2
total 60
-rw------- 1 storagewars storagewars 10758 Jun  2 09:59 config.yaml
drwxr-xr-x 2 storagewars storagewars  4096 Nov 30  2023 ID2
drwx------ 2 root        root         4096 Nov 30  2023 lost+found
drwx------ 4 root        root         4096 Aug 29  2023 orders
drwxr-xr-x 2 root        root         4096 May 20 14:23 retain
-rw------- 1 root        root        32768 Jun  2 14:32 revocations.db
drwx------ 6 storagewars storagewars  4096 Jun  2 14:32 storage
-rw------- 1 root        root          933 Jun  2 14:31 trust-cache.json
storagewars@raspberrypi:~ $ ls -l /home/storagewars/Hds/
total 12
drwxrwxrwx 7 root        root        4096 Jun  2 10:30 HD1
drwxrwxrwx 7 root        root        4096 Jun  2 14:31 HD2
-rw-r--r-- 1 storagewars storagewars  116 Aug 29  2023 Nodes_ID
storagewars@raspberrypi:~ $

toma · June 2, 2024, 1:40pm

How do i disable the check? Should i change ownership of the Hds folder?

Alexey · June 2, 2024, 1:45pm

Likely yes, you need to change the owner if you would continue using the --user option in your docker run command. If you would decide to continue to use it, you must also change the owner recursively of the storage directory to your user.

toma · June 2, 2024, 1:52pm

Going to change user and permissions on that folder! waiting to see if it helps! i don’t know why it worked before with same config!

Alexey · June 2, 2024, 1:53pm

Seems you changed something? Like OS or the hardware?

toma · June 2, 2024, 1:58pm

Nothing… changed nothing! The pi is there quietly since it’s inception!

storagewars@raspberrypi:~ $ ls -l
total 168
-rw-r--r-- 1 storagewars storagewars 123595 Jun  1 23:58 2024-06-01-235835_1600x1200_scrot.png
drwxr-xr-x 2 storagewars storagewars   4096 May  3  2023 Bookshelf
drwxr-xr-x 3 storagewars storagewars   4096 May 31 09:31 Desktop
drwxr-xr-x 2 storagewars storagewars   4096 Aug 20  2023 Documents
drwxr-xr-x 2 storagewars storagewars   4096 Aug 20  2023 Downloads
drwxrwxrwx 4 root        root          4096 Jan 17 19:23 Hds
drwxr-xr-x 2 storagewars storagewars   4096 Aug 20  2023 Music
drwxr-xr-x 2 storagewars storagewars   4096 Aug 20  2023 Pictures
drwxr-xr-x 2 storagewars storagewars   4096 Aug 20  2023 Public
drwxr-xr-x 4 storagewars storagewars   4096 Jan 21 15:06 Storage
drwxr-xr-x 2 storagewars storagewars   4096 Aug 20  2023 Templates
drwxr-xr-x 2 storagewars storagewars   4096 Aug 20  2023 Videos
storagewars@raspberrypi:~ $ sudo chown -R storagewars: /home/storagewars/Hds/
^C
storagewars@raspberrypi:~ $ sudo su
root@raspberrypi:/home/storagewars# cd //
root@raspberrypi://# sudo chown -R storagewars: /home/storagewars/Hds/

When it’s done going to change permissions to it!

Alexey · June 2, 2024, 2:01pm

Permissions and the ownership are must to be assigned to:

toma · June 2, 2024, 2:11pm

root@raspberrypi://# sudo chown -R storagewars:storagewars /home/storagewars/Hds/

Alexey · June 2, 2024, 2:20pm

Do you have any other problems with your setup?

toma · June 2, 2024, 2:38pm

It doesn’t let me change owner

A domingo, 2/06/2024, 15:30, Alexey via Storj Community Forum (official) <storj@literatehosting.com> escreveu:

Alexey · June 2, 2024, 2:55pm

why?
you always can run (replace /mnt/storj to your own path):

sudo chown $(id -u):$(id -g) -R /mnt/storj

toma · June 2, 2024, 3:53pm

do i neet to stop the node? do it for both nodes?

toma · June 2, 2024, 4:01pm

Run the command for 1 node

storagewars@raspberrypi:~ $ sudo chown $(id -u):$(id -g) -R /home/storagewars/Hds/HD2

it didn’t give a output

toma · June 2, 2024, 4:35pm

storagewars@raspberrypi:~/Hds/HD2/retain $ cd ..
storagewars@raspberrypi:~/Hds/HD2 $ sudo chmod  777 orders
storagewars@raspberrypi:~/Hds/HD2 $ ls -l
total 60
-rwxrwxrwx 1 storagewars storagewars 10758 Jun  2 09:59 config.yaml
drwxrwxrwx 2 storagewars storagewars  4096 Nov 30  2023 ID2
drwxrwxrwx 2 root        root         4096 Nov 30  2023 lost+found
drwxrwxrwx 4 storagewars storagewars  4096 Aug 29  2023 orders
drwxr-xr-x 2 root        root         4096 May 20 14:23 retain
-rwxrwxrwx 1 root        root        32768 Jun  2 16:58 revocations.db
drwxrwxrwx 6 storagewars storagewars  4096 Jun  2 16:58 storage
-rwxrwxrwx 1 root        root          933 Jun  2 16:58 trust-cache.json
storagewars@raspberrypi:~/Hds/HD2 $ cd ..
storagewars@raspberrypi:~/Hds $ cd HD1
storagewars@raspberrypi:~/Hds/HD1 $ ls
config.yaml  ID  lost+found  orders  retain  revocations.db  storage  trust-cache.json
storagewars@raspberrypi:~/Hds/HD1 $ ls -l
total 72
-rwxrwxrwx 1 storagewars storagewars 10768 Jan 21 20:14 config.yaml
drwxrwxrwx 3 storagewars storagewars  4096 Jan 21 14:58 ID
drwx------ 2 root        root        16384 Jan 17 19:22 lost+found
drwx------ 4 root        root         4096 Jan 21 20:22 orders
drwxr-xr-x 2 root        root         4096 Jun  1 09:22 retain
-rw------- 1 root        root        32768 Jun  2 16:52 revocations.db
drwxrwxrwx 6 storagewars storagewars  4096 Jun  2 17:32 storage
-rw------- 1 root        root          933 Jun  2 16:52 trust-cache.json
storagewars@raspberrypi:~/Hds/HD1 $ sudo docker start storagenode22
storagenode22
storagewars@raspberrypi:~/Hds/HD1 $ ls -l

getting tired! could it be a port closed?

Alexey · June 3, 2024, 3:45am

You need to wait until it finish. It changes the owner for all files.

Did you check it?
And do you have errors in the logs? Like “ping satellite failed” (except “rate”)?

toma · June 3, 2024, 11:27am

I’m not able to see it now! Will check it when at home.

A segunda, 3/06/2024, 04:55, Alexey via Storj Community Forum (official) <storj@literatehosting.com> escreveu:

toma · June 3, 2024, 9:24pm

storagewars@raspberrypi:~/Hds/HD1 $ sudo docker logs storagenode22 | grep ‘ping’

2024-06-03T21:22:38Z	ERROR	contact:service	ping satellite failed 	{"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "attempts": 1, "error": "ping satellite: rpc: tcp connector failed: rpc: dial tcp: lookup saltlake.tardigrade.io: operation was canceled", "errorVerbose": "ping satellite: rpc: tcp connector failed: rpc: dial tcp: lookup saltlake.tardigrade.io: operation was canceled\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190"}
2024-06-03T21:22:38Z	ERROR	contact:service	ping satellite failed 	{"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "attempts": 1, "error": "ping satellite: rpc: tcp connector failed: rpc: dial tcp: lookup eu1.storj.io: operation was canceled", "errorVerbose": "ping satellite: rpc: tcp connector failed: rpc: dial tcp: lookup eu1.storj.io: operation was canceled\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190"}

Had two ports on same rule, changed it to two rules on same port! let’s wait