Why my container is restarting all the time?

Gerarardit · October 19, 2020, 7:36pm

For some reason my container on my rpi 4b is restarting all the time what it could be?
the restart is every 10 sec

baker · October 19, 2020, 7:42pm

Have you checked the log for errors?

Gerarardit · October 19, 2020, 8:56pm

How do I check the logs?

baker · October 19, 2020, 9:54pm

If you haven’t redirected your logs to a file, you can use the following command:

docker logs --tail 50 storagenode

This will give you the last 50 lines from the storagenode log. From https://documentation.storj.io/resources/faq/check-logs

Doom4535 · October 19, 2020, 10:38pm

What command did you use to create the container? I had a node that did this when it was passed an invalid Storj flag.

Gerarardit · October 20, 2020, 4:43am

docker logs --tail 50 storagenode panic: runtime error: makeslice: len out of range [recovered]
panic: runtime error: makeslice: len out of range

goroutine 862 [running]:
github.com/spacemonkeygo/monkit/v3.newSpan.func1(0x0)
/go/pkg/mod/github.com/spacemonkeygo/monkit/v3@v3.0.7-0.20200515175308-072401d8c752/ctx.go:147 +0x2e0
panic(0x8aafc8, 0xa609b8)
/usr/local/go/src/runtime/panic.go:969 +0x118
github.com/spacemonkeygo/monkit/v3.newSpan.func1(0x0)
/go/pkg/mod/github.com/spacemonkeygo/monkit/v3@v3.0.7-0.20200515175308-072401d8c752/ctx.go:147 +0x2e0
panic(0x8aafc8, 0xa609b8)
/usr/local/go/src/runtime/panic.go:975 +0x3c4
storj.io/storj/storagenode/orders.readLimitAndOrder(0xa6c850, 0x17e0558, 0x1ae8500, 0x14999c0, 0x0, 0x0)
/go/src/storj.io/storj/storagenode/orders/store.go:536 +0x9c
storj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite.func1(0x1ae7880, 0x69, 0xa777e0, 0x1add950, 0x0, 0x0, 0x0, 0x0)
/go/src/storj.io/storj/storagenode/orders/store.go:242 +0x3a4
path/filepath.walk(0x1ae7880, 0x69, 0xa777e0, 0x1add950, 0x18d3c04, 0x0, 0x0)
/usr/local/go/src/path/filepath/path.go:360 +0x2fc
path/filepath.walk(0x1615220, 0x14, 0xa777e0, 0x1af4870, 0x18d3c04, 0x0, 0x6729d0)
/usr/local/go/src/path/filepath/path.go:384 +0x204
path/filepath.Walk(0x1615220, 0x14, 0x1698c04, 0x1c, 0x179c940)
/usr/local/go/src/path/filepath/path.go:406 +0xe8
storj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite(0x161edc0, 0xcf462e2d, 0xbfdbb888, 0x650a3064, 0x1, 0x1046d78, 0x179c960, 0x0, 0x0)
/go/src/storj.io/storj/storagenode/orders/store.go:198 +0xdc
storj.io/storj/storagenode/orders.(*Service).sendOrdersFromFileStore(0x1672000, 0xa758f8, 0x1af6000, 0xcf462e2d, 0xbfdbb888, 0x650a3064, 0x1, 0x1046d78)
/go/src/storj.io/storj/storagenode/orders/service.go:398 +0x300
storj.io/storj/storagenode/orders.(*Service).SendOrders(0x1672000, 0xa75a78, 0x18dbbb8, 0xcf462e2d, 0xbfdbb888, 0x650a3064, 0x1, 0x1046d78)
/go/src/storj.io/storj/storagenode/orders/service.go:192 +0x168
storj.io/storj/storagenode/orders.(*Service).Run.func1(0xa75a78, 0x18dbbb8, 0xa75a78, 0x18dbbb8)
/go/src/storj.io/storj/storagenode/orders/service.go:139 +0x84
storj.io/common/sync2.(*Cycle).Run(0x1610b70, 0xa758f8, 0x172fb60, 0x18dba50, 0x0, 0x0)
/go/pkg/mod/storj.io/common@v0.0.0-20200925121432-61f74bdf4b5c/sync2/cycle.go:92 +0x134
storj.io/common/sync2.(*Cycle).Start.func1(0x1599f40, 0x0)
/go/pkg/mod/storj.io/common@v0.0.0-20200925121432-61f74bdf4b5c/sync2/cycle.go:71 +0x34
golang.org/x/sync/errgroup.(*Group).Go.func1(0x17a8e10, 0x179c5c0)
/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:57 +0x50
created by golang.org/x/sync/errgroup.(*Group).Go
/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:54 +0x50
2020-10-20T04:42:23.658ZINFOConfiguration loaded{“Location”: “/app/config/config.yaml”}
2020-10-20T04:42:23.662ZINFOOperator email{“Address”: “swaggyroc1@gmail.com”}
2020-10-20T04:42:23.662ZINFOOperator wallet{“Address”: “0xC1E4aD79C6F4FD21C2e09a09639E57D002552Dc5”}
2020-10-20T04:42:24.591ZINFOTelemetry enabled
2020-10-20T04:42:24.614ZINFOdb.migrationDatabase Version{“version”: 45}
2020-10-20T04:42:25.345ZINFOpreflight:localtimestart checking local system clock with trusted satellites’ system clock.
2020-10-20T04:42:26.264ZINFOpreflight:localtimelocal system clock is in sync with trusted satellites’ system clock.
2020-10-20T04:42:26.264ZINFObandwidthPerforming bandwidth usage rollups
2020-10-20T04:42:26.265ZINFONode 12p8vXhGUUoL5p3Gw7vi68VfZ3D9hgdWniSY22sDVf1E8uTedBv started
2020-10-20T04:42:26.265ZINFOPublic server started on [::]:28967
2020-10-20T04:42:26.265ZINFOPrivate server started on 127.0.0.1:77782020-10-20T04:42:26.266ZINFOtrustScheduling next refresh{“after”: “5h9m33.468063243s”}

Gerarardit · October 20, 2020, 12:12pm

docker run -d --restart unless-stopped --stop-timeout 300
-p 28967:28967
-p 14002:14002
-e WALLET=“0x”
-e EMAIL="Email@email.com"
-e ADDRESS=“Ip:28967”
-e STORAGE=“900GB”
–mount type=bind,source="/home/pi/storagenode",destination=/app/identity
–mount type=bind,source="/mnt/storj",destination=/app/config
–name storagenode storjlabs/storagenode:latest

baker · October 20, 2020, 12:41pm

Looks like you are seeing the same issue as reported here:

@ifraixedes was looking into this the last I saw, and said an upcoming release might fix it. Not sure what’s to be done in the mean time.

Doom4535 · October 20, 2020, 12:42pm

Actually, I think @baker is correct with this, as your error does match the error he referenced.

Original post

@Gerarardit, can you use the back tick quotes to trigger a markdown code block?

Here is an example for one of my nodes:

docker run -d --restart always --stop-timeout 300 \
    -p 28967:28967 \
    -p 14002:14002 \
    -p 127.0.0.1:6002:5999 \
    -e WALLET="<your_address>" \
    -e EMAIL="<your_email>" \
    -e ADDRESS="<your_address>:28967" \
    -e STORAGE="1.6TB" \
    --mount type=bind,source="/srv/dev-disk-by-label-DataDrive02/StorjNode02/Identity",destination=/app/identity \
    --mount type=bind,source="/srv/dev-disk-by-label-DataDrive02/StorjNode02/Data",destination=/app/config \
    --name StorjNode-02 storjlabs/storagenode:latest \
    --debug.addr=":5999"

The block quotes prevent the site from modifying some of the symbols.

I would go ahead and check your command to make sure that the - and -- arguments aren’t switched around (and if so, use a different text editor that won’t play games with you, something such as vim, notepad, vs code, atom, etc.).

ifraixedes · October 20, 2020, 1:45pm

Find the order limit that breaks it and remove it, but that’s difficult.

A much easier approach is:

Stop the storage node process.
Move all the order files from the orders/unsent directory to another directory.
Start the storage node process again and check if it doesn’t crash.

The problem with that is that all those order limits won’t be sent hence the SNO won’t get paid for that bandwidth usage, hence this is a temporary workaround.

NOTE I’m not sure if this will fix the issue temporarily so SNO should assess it if it’s worth trying it out.

Gerarardit · October 20, 2020, 2:36pm

ok thanks for now its not crashing during 5min so i think its solved before itm crashed every 10sec. I will keep up to date if crash another time or if it continues working

ifraixedes · November 10, 2020, 9:30am

With the release v1.15.3 we expected that this issue is fixed.
Please if your node isn’t in that release or above update it and if it is and you’re seeing this error report it again.

Thank you so much for your collaboration.

camulodunum · November 26, 2020, 5:56am

Hi.

I’ve had the same problem and solved moving all the files to another directory

but now I don’t know if I need to keep those files,for how long,or when I have to return those files to the original folder,or if I can remove them

Pac · November 26, 2020, 7:25am

Order files expire after 48h, so these files are probably good for the trash now…

ifraixedes · November 26, 2020, 10:23am

@camulodunum I state that @Pac is right

linuxgeek · November 29, 2020, 9:52pm

I have the same issue today, its restart every 7 seconds. My node version is 1.16.1. Below is the log:
2020-11-29T21:48:31.486Z INFO Configuration loaded {“Location”: “/app/config/config.yaml”}
2020-11-29T21:48:31.507Z INFO Operator email {“Address”: “xxx@gmail.com”}
2020-11-29T21:48:31.507Z INFO Operator wallet {“Address”: “0x”}
2020-11-29T21:48:31.808Z INFO Telemetry enabled
2020-11-29T21:48:31.828Z INFO db.migration Database Version {“version”: 46}
2020-11-29T21:48:32.392Z INFO preflight:localtime start checking local system clock with trusted satellites’ system clock.
2020-11-29T21:48:33.077Z INFO preflight:localtime local system clock is in sync with trusted satellites’ system clock.
2020-11-29T21:48:33.077Z INFO bandwidth Performing bandwidth usage rollups
2020-11-29T21:48:33.077Z INFO Node 12iXXXXXXXXXXXXXXXXXXXXXXXXXXXXXoLK started
2020-11-29T21:48:33.077Z INFO Public server started on [::]:28967
2020-11-29T21:48:33.077Z INFO Private server started on 127.0.0.1:7778
2020-11-29T21:48:33.077Z INFO trust Scheduling next refresh {“after”: “4h51m45.079049338s”}
2020-11-29T21:48:39.366Z ERROR piecestore:cache error getting current used space: {“error”: “readdirent: input/output error; readdirent: input/output error”, “errorVerbose”: “group:\n— readdirent: input/output error\n— readdirent: input/output error”}
2020-11-29T21:48:39.366Z ERROR services unexpected shutdown of a runner {“name”: “piecestore:cache”, “error”: “readdirent: input/output error; readdirent: input/output error”, “errorVerbose”: “group:\n— readdirent: input/output error\n— readdirent: input/output error”}
2020-11-29T21:48:39.866Z ERROR servers unexpected shutdown of a runner {“name”: “debug”, “error”: “debug: http: Server closed”, “errorVerbose”: “debug: http: Server closed\n\tstorj.io/private/debug.(*Server).Run.func2:108\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
Error: readdirent: input/output error; readdirent: input/output error
Anyone can direct me what to do? Thanks.

deathlessdd · November 30, 2020, 1:22am

Looks like it could be a hard drive problem.

linuxgeek · November 30, 2020, 2:21am

Are there any ways to salvage the data? I check the drive since this is an array and it seems there is no failed device. Here is the result of mdadm --detail:
root@evoiseeu:~# mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Mon Nov 2 11:03:18 2020
Raid Level : raid0
Array Size : 2929890816 (2794.16 GiB 3000.21 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent

   Update Time : Mon Nov  2 11:03:18 2020
         State : clean
Active Devices : 3

Working Devices : 3
Failed Devices : 0
Spare Devices : 0

    Chunk Size : 512K

Consistency Policy : none

          Name : xxxxxxx.xxxxxxxxx.ca:0  (local to host xxxxxxxx.xxxxxxxxxx.ca)
          UUID : 06b843d3:03887eed:b55d5ad4:411d0457
        Events : 0

Number   Major   Minor   RaidDevice State
   0       8       16        0      active sync   /dev/sdb
   1       8       32        1      active sync   /dev/sdc
   2       8       48        2      active sync   /dev/sdd

And here is the result of mdstat:
cat /proc/mdstat
Personalities : [raid0] [linear] [multipath] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid0 sdb[0] sdd[2] sdc[1]
2929890816 blocks super 1.2 512k chunks

unused devices:

Alexey · November 30, 2020, 9:50pm

Please, run a fsck
Also, the RAID0 is a direct way to lose all data at once.

linuxgeek · December 2, 2020, 12:43am

I tried to use fsck and return no disk error. Anyways I ended up moving the data to another disk, then switch the raid0 to raid1 and copy back the data to the raid1 and everything went back online. Thanks.