Node is exiting after migration

One of my nodes was offline for 2 days because the drive holding the docker image and identity has gone “missing”.

Restored a backup of the identity to the server and pulled the latest images (node was running on a bit older version before) from docker hub. Everything seemed to work fine for the first 2 hours. Butt then the docker exited.

Looking at the logs, I get a lot of Errors similar like this

2022-03-29T06:42:47.087Z ERROR collector unable to delete piece {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "MC7C2T6Y2GSYHAH47S6OZL5CJ5QUI35ZUKEQUUDJQVVF54S3BOVA", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:239\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:220\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

These logs occur just before the docker is exiting:

2022-03-29T06:42:47.601Z ERROR db Unable to read the disk, please verify the disk is not corrupt

(The disk which holds the db and stored data seems to be writable)

2022-03-29T06:42:48.220Z ERROR piecestore:cache error getting current used space: {"error": "lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/s5/aoedwmw5ul5mvmwj2gr7axcn7mmfvhfynhvt4qgydsfqu6co7q.sj1: structure needs cleaning; lstat config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/sb/fzwsgkrsqknqbr4kignx4xdzhiu2vhq6hfnlqy7svwhl45m7ga.sj1: structure needs cleaning; lstat config/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa/yp/yspmdbfmpvyd3xkjzscgkkeba4zsvnuexxjdnv6n7don24hgrq.sj1: structure needs cleaning; lstat config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/iw/75kleegxtbsqok6gr7sqy7u26qqs4ijfv3h2wb4v6rus4bje6q.sj1: structure needs cleaning; lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/4i/5chgkozsdxlq5pmczh2ipjlcx7hhdn5kmcj2zahcqizfdaw5ca.sj1: structure needs cleaning", "errorVerbose": "group:\n--- lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/s5/aoedwmw5ul5mvmwj2gr7axcn7mmfvhfynhvt4qgydsfqu6co7q.sj1: structure needs cleaning\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:788\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:284\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:497\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:662\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- lstat config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/sb/fzwsgkrsqknqbr4kignx4xdzhiu2vhq6hfnlqy7svwhl45m7ga.sj1: structure needs cleaning\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:788\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:284\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:497\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:662\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- lstat config/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa/yp/yspmdbfmpvyd3xkjzscgkkeba4zsvnuexxjdnv6n7don24hgrq.sj1: structure needs cleaning\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:788\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:284\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:497\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:662\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- lstat config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/iw/75kleegxtbsqok6gr7sqy7u26qqs4ijfv3h2wb4v6rus4bje6q.sj1: structure needs cleaning\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:788\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:284\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:497\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:662\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/4i/5chgkozsdxlq5pmczh2ipjlcx7hhdn5kmcj2zahcqizfdaw5ca.sj1: structure needs cleaning\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:788\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:284\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:497\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:662\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-29T06:42:47.911Z ERROR contact:service ping satellite failed {"Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "attempts": 2, "error": "ping satellite: check-in ratelimit: node rate limited by id", "errorVerbose": "ping satellite: check-in ratelimit: node rate limited by id\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:136\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:98\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-29T06:42:48.221Z ERROR nodestats:cache Get pricing-model/join date failed {"error": "context canceled"}
2022-03-29T06:42:48.221Z ERROR contact:service ping satellite failed {"Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "attempts": 2, "error": "ping satellite: context canceled", "errorVerbose": "ping satellite: context canceled\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:136\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:98\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
Error: lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/s5/aoedwmw5ul5mvmwj2gr7axcn7mmfvhfynhvt4qgydsfqu6co7q.sj1: structure needs cleaning; lstat config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/sb/fzwsgkrsqknqbr4kignx4xdzhiu2vhq6hfnlqy7svwhl45m7ga.sj1: structure needs cleaning; lstat config/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa/yp/yspmdbfmpvyd3xkjzscgkkeba4zsvnuexxjdnv6n7don24hgrq.sj1: structure needs cleaning; lstat config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/iw/75kleegxtbsqok6gr7sqy7u26qqs4ijfv3h2wb4v6rus4bje6q.sj1: structure needs cleaning; lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/4i/5chgkozsdxlq5pmczh2ipjlcx7hhdn5kmcj2zahcqizfdaw5ca.sj1: structure needs cleaning
2022-03-29 06:42:50,549 INFO stopped: storagenode (exit status 1)
2022-03-29 06:42:50,550 INFO waiting for processes to die
2022-03-29 06:42:50,550 INFO stopped: processes (terminated by SIGTERM)

This behavior repeats upon restarting the node.

What is the issue with this?
How should I try fixing this?

Thanks for your assistance.

You need to check your disk for errors and fix them. The filesystem on your disk is now in corrupted state.
Please stop and remove the container

docker stop -t 300 storagenode
docker rm storagenode

Then check your disk with appropriate tools for your OS.

1 Like

Thanks alot Alexey!

I removed the docker image.
Unmounted the volume and ran xfs-repair on that partition. It had quite a few bad inodes, which got rebuilt.

After reinstalling the docker container, the setup worked flawlessly.
Node I now up 24h+ :slight_smile:

1 Like