Reboot loop after upgrade

After upgrade to version v1.15.3 storage node working some time (about 10min) then:

2020-10-27T18:06:57.360696435Z 2020-10-27T18:06:57.360Z INFO    piecestore      upload started  {"Piece ID": "HL5CDHVUZ2SS7WFA36QFTMHVS7BQOQK56SWKO6OY4GGOQYZT5VGQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "Available Space": 983028596352}
2020-10-27T18:06:57.377417673Z 2020-10-27T18:06:57.377Z INFO    piecestore      uploaded        {"Piece ID": "HL5CDHVUZ2SS7WFA36QFTMHVS7BQOQK56SWKO6OY4GGOQYZT5VGQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT"}
2020-10-27T18:06:57.434510983Z 2020-10-27T18:06:57.433Z ERROR   piecestore:cache        error getting current used space:       {"error": "lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/gt/367w5ejuvljgu22lfiiuebswq2spdlkcrs3wthmtpjcdnf57wa.sj1: bad message; lstat config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/bh/mq7gmkve5yx24emqrj5xmn4mf3apdm5w2xduhaifhd22fpybra.sj1: bad message", "errorVerbose": "group:\
n--- lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/gt/367w5ejuvljgu22lfiiuebswq2spdlkcrs3wthmtpjcdnf57wa.sj1: bad message\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:787\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:280\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSat
ellitePieces:489\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:654\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func1:57\n\tgolang.org/x/sync
/errgroup.(*Group).Go.func1:57\n--- lstat config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/bh/mq7gmkve5yx24emqrj5xmn4mf3apdm5w2xduhaifhd22fpybra.sj1: bad message\n\tstorj.io/storj/storage/filestore.walkNamespa
ceWithPrefix:787\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:280\n\tstorj.io/storj/stor
agenode/pieces.(*Store).WalkSatellitePieces:489\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:654\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Ru
n.func1:57\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-10-27T18:06:57.436759259Z 2020-10-27T18:06:57.436Z ERROR   services        unexpected shutdown of a runner {"name": "piecestore:cache", "error": "lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/gt/
367w5ejuvljgu22lfiiuebswq2spdlkcrs3wthmtpjcdnf57wa.sj1: bad message; lstat config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/bh/mq7gmkve5yx24emqrj5xmn4mf3apdm5w2xduhaifhd22fpybra.sj1: bad message", "errorVerbos
e": "group:\n--- lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/gt/367w5ejuvljgu22lfiiuebswq2spdlkcrs3wthmtpjcdnf57wa.sj1: bad message\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:787\n\
tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:280\n\tstorj.io/storj/storagenode/pieces.(*St
ore).WalkSatellitePieces:489\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:654\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func1:57\n\tgolan
g.org/x/sync/errgroup.(*Group).Go.func1:57\n--- lstat config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/bh/mq7gmkve5yx24emqrj5xmn4mf3apdm5w2xduhaifhd22fpybra.sj1: bad message\n\tstorj.io/storj/storage/filestore
.walkNamespaceWithPrefix:787\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:280\n\tstorj.i
o/storj/storagenode/pieces.(*Store).WalkSatellitePieces:489\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:654\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle
.(*Group).Run.func1:57\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

Crashing and container restarting.

1 Like

I cant confirm this yet being an issue, since my nodes havent updated yet. Did this happen on all your linux nodes or just one?

1 Like

not that this helps you in any way, but one of my 3 linux nodes upgraded a couple of hours ago. Seemed to start right back up without issue.

1 Like

Check your disk for errors. The free space could be marked as allocated which fsck could locate. You are a pro so you know the best methods already :slight_smile:

2 Likes

Thanks! :slightly_smiling_face:
I already fixed this issue, yep it was the filesystem error.
But I wonder… yesterday I have database corruption on another location, today main filesystem corruption… I will definitely investigate the root cause.

4 Likes

Please, also share details of that setup

1 Like

Yes, sure, I will create a new thread in “Production Enthusiasts” and post all detailed information.

3 Likes