Node restarts once a day

Ryzen_X · June 14, 2023, 11:44am

For a week my node restarts once a day without me knowing why.
The node is full (3TB) and has been running for over a year without problems. Under docker
I just had time to get his logs :

github.com/spf13/cobra.(*Command).execute:852
github.com/spf13/cobra.(*Command).ExecuteC:960
github.com/spf13/cobra.(*Command).Execute:897
storj.io/private/process.ExecWithCustomOptions:113
main.main:30
runtime.main:250
2023-06-14 11:19:33,590 INFO stopped: storagenode (exit status 1)
2023-06-14 11:19:33,591 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)
2023-06-14 11:19:52,641 INFO Set uid to user 0 succeeded
2023-06-14 11:19:52,654 INFO RPC interface ‘supervisor’ initialized
2023-06-14 11:19:52,654 INFO supervisord started with pid 1
2023-06-14 11:19:53,656 INFO spawned: ‘processes-exit-eventlistener’ with pid 11
2023-06-14 11:19:53,658 INFO spawned: ‘storagenode’ with pid 12
2023-06-14 11:19:53,660 INFO spawned: ‘storagenode-updater’ with pid 13
2023-06-14T11:19:53.673Z INFO Configuration loaded {“Process”: “storagenode-updater”, “Location”: “/app/config/config.yaml”}

But I don’t think he’s talking. What can I do to find out what is going on ? Thanks

JWvdV · June 14, 2023, 12:36pm

You don’t see any logs before it restarts?
We only see the the end of the probable error, that’s causing it.
So we need some longer log.

Ryzen_X · June 14, 2023, 2:33pm

Ok thanks, I will try

Ryzen_X · June 19, 2023, 9:20am

I managed to get the logs. You think this is definitely a disk problem. Because I repaired the bandwidth.db database a priori successfully, and I did not have this problem before the database problem

2023-06-19T08:50:21.528Z ERROR piecestore:cache error getting current used space: {“process”: “storagenode”, “error”: “filewalker: unrecoverable error accessing data on the storage file system (path=config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/mb/qzowstduiadn4u5e3s6hu5ust2ekpguqsofod6whwwwpe3m7eq.sj1; error=lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/mb/qzowstduiadn4u5e3s6hu5ust2ekpguqsofod6whwwwpe3m7eq.sj1: structure needs cleaning). This is most likely due to disk bad sectors or a corrupted file system. Check your disk for bad sectors and integrity”, “errorVerbose”: “filewalker: unrecoverable error accessing data on the storage file system (path=config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/mb/qzowstduiadn4u5e3s6hu5ust2ekpguqsofod6whwwwpe3m7eq.sj1; error=lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/mb/qzowstduiadn4u5e3s6hu5ust2ekpguqsofod6whwwwpe3m7eq.sj1: structure needs cleaning). This is most likely due to disk bad sectors or a corrupted file system. Check your disk for bad sectors and integrity\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:69\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:74\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:718\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
19/06/2023 10:50:21

2023-06-19T08:50:21.536Z ERROR services unexpected shutdown of a runner {“process”: “storagenode”, “name”: “piecestore:cache”, “error”: “filewalker: unrecoverable error accessing data on the storage file system (path=config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/mb/qzowstduiadn4u5e3s6hu5ust2ekpguqsofod6whwwwpe3m7eq.sj1; error=lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/mb/qzowstduiadn4u5e3s6hu5ust2ekpguqsofod6whwwwpe3m7eq.sj1: structure needs cleaning). This is most likely due to disk bad sectors or a corrupted file system. Check your disk for bad sectors and integrity”, “errorVerbose”: “filewalker: unrecoverable error accessing data on the storage file system (path=config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/mb/qzowstduiadn4u5e3s6hu5ust2ekpguqsofod6whwwwpe3m7eq.sj1; error=lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/mb/qzowstduiadn4u5e3s6hu5ust2ekpguqsofod6whwwwpe3m7eq.sj1: structure needs cleaning). This is most likely due to disk bad sectors or a corrupted file system. Check your disk for bad sectors and integrity\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:69\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:74\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:718\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

19/06/2023 10:50:21

Error: filewalker: unrecoverable error accessing data on the storage file system (path=config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/mb/qzowstduiadn4u5e3s6hu5ust2ekpguqsofod6whwwwpe3m7eq.sj1; error=lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/mb/qzowstduiadn4u5e3s6hu5ust2ekpguqsofod6whwwwpe3m7eq.sj1: structure needs cleaning). This is most likely due to disk bad sectors or a corrupted file system. Check your disk for bad sectors and integrity

19/06/2023 10:50:21

2023-06-19 08:50:21,799 INFO exited: storagenode (exit status 1; not expected)

19/06/2023 10:50:22

2023-06-19 08:50:22,813 INFO spawned: ‘storagenode’ with pid 245

19/06/2023 10:50:22

2023-06-19 08:50:22,852 WARN received SIGQUIT indicating exit request

19/06/2023 10:50:22

2023-06-19 08:50:22,857 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die

19/06/2023 10:50:22

2023-06-19T08:50:22.881Z INFO Got a signal from the OS: “terminated” {“Process”: “storagenode-updater”}

19/06/2023 10:50:22

2023-06-19 08:50:22,883 INFO stopped: storagenode-updater (exit status 0)

19/06/2023 10:50:22

2023-06-19 08:50:22,904 INFO stopped: storagenode (terminated by SIGTERM)

19/06/2023 10:50:22

2023-06-19 08:50:22,904 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)

19/06/2023 10:50:41

2023-06-19 08:50:41,059 INFO Set uid to user 0 succeeded

19/06/2023 10:50:41

2023-06-19 08:50:41,072 INFO RPC interface ‘supervisor’ initialized

nerdatwork · June 19, 2023, 6:47pm

Yes. It clearly states