Unexpected shutdown of a runner: not a directory

Bivvo · February 1, 2022, 3:15pm

Hi there,

Once a day I receive the following error (and only this single one):

2022-02-01T14:57:02.630Z        ERROR   services        unexpected shutdown of a runner {"name": "piecestore:cache", "error": "readdirent config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/5k: not a directory"}

I assume this is a stability topic with Docker, but not 100% sure. The posts in the forum, I‘ve found, were not helpful so far.

At least the message is confusing, as it a) can let think it is a read/write issue with the HDD as well as b) programmatic issue (not a directory) or c) services / Docker issue (shutdown of a runner).

Please help being unconfused.

BTW. It seems that the node has restarted by its own - as in the dashboard I can see, it is up since around 22mins. Never have seen this before the latest release to 1.47.3.

Stob · February 1, 2022, 5:01pm

HI @Bivvo,
If the node has restarted on it’s own please provide 20 to 30 log entries either side of the restart time. We can then work out if these are relevant for the error listed.

Bivvo · February 1, 2022, 6:12pm

I think this should help:

2022-02-01T14:56:56.565Z	INFO	piecestore	upload started	{"Piece ID": "CUXOQUI4W27VVUTZQRWRYEPMQZJ5344HUBXT7NSKNVMGBTV6E2CA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 8109283744708}
2022-02-01T14:56:57.058Z	INFO	piecestore	uploaded	{"Piece ID": "CUXOQUI4W27VVUTZQRWRYEPMQZJ5344HUBXT7NSKNVMGBTV6E2CA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Size": 181504}
2022-02-01T14:56:57.603Z	INFO	piecestore	download started	{"Piece ID": "5KRXHY6R57MV5QFEHSN726WRRAI544VJOPF5DVTSSXPFEMY64VOQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET"}
2022-02-01T14:56:57.829Z	INFO	piecestore	downloaded	{"Piece ID": "5KRXHY6R57MV5QFEHSN726WRRAI544VJOPF5DVTSSXPFEMY64VOQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET"}
2022-02-01T14:57:01.204Z	INFO	piecestore	upload started	{"Piece ID": "QQTQ5E3ZDUFFD6LODJ24NVADFGWIUXTK63NY2HSATXEHIP2BO6QQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 8109283562692}
2022-02-01T14:57:02.296Z	INFO	piecestore	uploaded	{"Piece ID": "QQTQ5E3ZDUFFD6LODJ24NVADFGWIUXTK63NY2HSATXEHIP2BO6QQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Size": 592128}
2022-02-01T14:57:02.628Z	ERROR	piecestore:cache	error getting current used space: 	{"error": "readdirent config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/5k: not a directory"}
2022-02-01T14:57:02.630Z	ERROR	services	unexpected shutdown of a runner	{"name": "piecestore:cache", "error": "readdirent config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/5k: not a directory"}
2022-02-01T14:57:02.634Z	INFO	piecestore	upload canceled	{"Piece ID": "YLPLWFA46MOL35Q6R5ZYBXL5OGSOKC3T7ZIAU4YH67OKIXGHADAA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Size": 0}
2022-02-01T14:57:02.676Z	INFO	piecestore	upload canceled	{"Piece ID": "MI7IFBKIBMZJEMIXH7T6GM22OBAHIU4S2S5ANPJME5I2RP5OMCMA", "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "Action": "PUT_REPAIR", "Size": 1843200}
2022-02-01T14:57:02.689Z	INFO	piecestore	upload canceled	{"Piece ID": "TTGD7AT667HILMIR2SFSM7UR5Y4EMJULTIUQIG2WMV6WBZ3A5EQQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT", "Size": 311296}
2022-02-01T14:57:02.701Z	INFO	piecestore	downloaded	{"Piece ID": "NSTGGXNGUALZ56BQQQTDRTX7F7MLDPJX2KNCLSORIQVXUQJJUJNQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET"}
2022-02-01T14:57:02.791Z	INFO	piecestore	upload canceled	{"Piece ID": "ISNKTNW4OMTNUVWB4STCKZHMBRIM4G3BIB6LDW4VKSLJBD363IBA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Size": 311296}
2022-02-01T14:57:02.902Z	INFO	piecestore	upload canceled	{"Piece ID": "HFGID4UCCV7T5LDW4AWU3GTAJHL4XIL5MI6MIWXSE5B64UNYXRAQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Size": 1056768}
2022-02-01T14:57:03.022Z	INFO	piecestore	upload canceled	{"Piece ID": "HH37ZC3HWIHBFWUVLPCEYDOWS4BSJIUQOCTAUEFUK4VZ5DR5T4EA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Size": 1318912}
Error: readdirent config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/5k: not a directory
2022-02-01T14:57:04.694Z	INFO	Configuration loaded	{"Location": "/app/config/config.yaml"}
2022-02-01T14:57:04.701Z	INFO	Operator email	{"Address": "xxx@gmail.com"}
2022-02-01T14:57:04.701Z	INFO	Operator wallet	{"Address": "xxx"}
2022-02-01T14:57:05.721Z	INFO	Telemetry enabled	{"instance ID": "xxx"}
2022-02-01T14:57:05.979Z	INFO	db.migration	Database Version	{"version": 53}
2022-02-01T14:57:06.462Z	INFO	preflight:localtime	start checking local system clock with trusted satellites' system clock.
2022-02-01T14:57:07.284Z	INFO	preflight:localtime	local system clock is in sync with trusted satellites' system clock.
2022-02-01T14:57:07.285Z	INFO	Node 1veqEG5xuBNkt6cK1NbmL5fbNxXq6E7JzB7CiiaptoapoWnB1i started
2022-02-01T14:57:07.285Z	INFO	Public server started on [::]:28967
2022-02-01T14:57:07.286Z	INFO	Private server started on 127.0.0.1:7778
2022-02-01T14:57:07.286Z	INFO	failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details.
2022-02-01T14:57:07.455Z	INFO	piecestore	download started	{"Piece ID": "IW5CBB6SIMLOFRFO4PS2AFPUCLJSUA5UPUUQT7SF7WCCWNILBZZA", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}
2022-02-01T14:57:07.917Z	INFO	trust	Scheduling next refresh	{"after": "2h7m28.998479763s"}
2022-02-01T14:57:07.919Z	INFO	bandwidth	Performing bandwidth usage rollups
2022-02-01T14:57:08.161Z	ERROR	collector	unable to delete piece	{"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "QZ5KTCJ7ZMOKGAMBPAPAF7RFNWPGOANXVZAJKSHT2GL6S2J3Z26Q", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:239\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:220\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-02-01T14:57:08.286Z	ERROR	collector	unable to delete piece	{"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "JA4FZOG6QNQV5YWZMFXSJIUOYGHO4IRJW2RMZ3ZPJXMNCETNNTCA", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:239\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:220\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-02-01T14:57:08.293Z	INFO	piecestore	downloaded	{"Piece ID": "IW5CBB6SIMLOFRFO4PS2AFPUCLJSUA5UPUUQT7SF7WCCWNILBZZA", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}

Seems that the config file could not be read for a second?
And I see on a very regular base something like these (as part of the log extract above):

2022-02-01T14:57:07.919Z	INFO	bandwidth	Performing bandwidth usage rollups
2022-02-01T14:57:08.161Z	ERROR	collector	unable to delete piece	{"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "QZ5KTCJ7ZMOKGAMBPAPAF7RFNWPGOANXVZAJKSHT2GL6S2J3Z26Q", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:239\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:220\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-02-01T14:57:08.286Z	ERROR	collector	unable to delete piece	{"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "JA4FZOG6QNQV5YWZMFXSJIUOYGHO4IRJW2RMZ3ZPJXMNCETNNTCA", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:239\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:220\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

On my second node I experience, also not very often (it’s another HDD, too):

2022-02-01T18:05:40.657Z	INFO	piecestore	uploaded	{"Piece ID": "HFZJVYGEZQEWSYQOSKEXCLJYYXQQ3BVKA2BI2Q3MZPGW7WZJRCIA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Size": 181504}
2022-02-01T18:05:43.056Z	ERROR	orders	cleaning filestore archive	{"error": "order: lstat config/orders/archive/archived-orders-1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE-1643569200000000000-1643576684730631097-ACCEPTED.v1: input/output error", "errorVerbose": "order: lstat config/orders/archive/archived-orders-1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE-1643569200000000000-1643576684730631097-ACCEPTED.v1: input/output error\n\tstorj.io/storj/storagenode/orders.(*FileStore).CleanArchive.func1:366\n\tpath/filepath.walk:438\n\tpath/filepath.Walk:505\n\tstorj.io/storj/storagenode/orders.(*FileStore).CleanArchive:364\n\tstorj.io/storj/storagenode/orders.(*Service).CleanArchive:163\n\tstorj.io/storj/storagenode/orders.(*Service).Run.func2:141\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-02-01T18:05:57.706Z	INFO	piecestore	upload started	{"Piece ID": "LMVRGJMRKJWD7PDWUPYZ353DQCNBBAG535YZTQTUGWYD24KUMCUA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 946898538942}

I have the feeling it might have something to do with the file system of both HDDs:
node 1: Apple_APFS
node 2: Apple_HFS

Both HDDs are connected with an Icy Box directly via USB-C / Thunderbolt to a Mac mini M1.

Alexey · February 1, 2022, 6:49pm

Unfortunately this is mean, that this disk have issues, or the storagenode doesn’t have access rights to this directory.

I would like to suggest to stop the storagenode, and check and fix disk errors.
The next step would be to check access rights.
But i have a feeling that this directory now is a file. In the last case you need to remove this corrupted file.
Of course it will mean that your node lost pieces which were under this directory and your node could be disqualified.

Bivvo · February 1, 2022, 7:29pm

Is it a good news, that I can find the directory and open it without problems?
I’ll do a full check / reset of access rights. Hope it helps.

Bivvo · February 2, 2022, 10:49am

Update: access rights “newly” set and disk check brought “no errors”.
Node1 online since some minutes and log files looking good so far.
Node2 online since yesterday after resetting access rights > no error so far.
Crossing fingers.
Thx for your help @Stob @Alexey

Bivvo · February 2, 2022, 12:16pm

Anyway, I do not understand an error message like this one:

open config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/bd/zyuv4flmiee4gnwuovjwnhd2yud3ezpqq3i73zmqaumswn6mgq.sj1: not a directory

Should that be a directory (*.sj1)? It’s not the case on my second, fresh node, too. These are always files.

Stob · February 2, 2022, 2:06pm

.sj1 should be a file.

Bivvo · February 2, 2022, 2:24pm

Then it’s weird. hmm
Btw. the FILE is accessible and readable (this includes the directory).

BrightSilence · February 2, 2022, 4:31pm

Is this the same node you had issues with when you used SMB? And you ended up moving to an M1 Mac?

You already had file system issues as a result of using SMB, which will probably explain the missing files and other weird things you see.

I also notice you posted this.

If your node takes more than 10 seconds to stop, the process will be force killed this way. This could cause all sorts of problems as well. Please always stop your node with.

docker stop -t 300 storagenode

Before quiting docker or shutting down/restarting the machine.

Bivvo · February 2, 2022, 4:36pm

Good point, yes, this could be the initial root cause. But still I do not understand the error messages, as the files itself are there (maybe corrupt?). But yes, the Node may have these legacy issues for a minority.

Valid point. I’ll extend that by the docker stop command - and changed it anyway to get executed, when one of the issues occurred, which were described in the post - not on a daily base. Nevertheless, that was not fired since then, but could explain issues with single files.

BrightSilence · February 2, 2022, 4:45pm

It may have resulted in partially written files that can now not be read. Prior file system corruption may have done the same.

Just to double check, how is your data HDD connected to the node machine(still the M1 Mac?) now? And what file system are you using?

Bivvo · February 2, 2022, 4:46pm

Yes, setup like this:

BrightSilence · February 2, 2022, 4:59pm

Ok well I don’t necessarily see a problem with that current setup as long as the USB connection is reliable and never interrupted (that includes interruptions due to power saving settings. Not sure how relevant that is on Mac)

Bivvo · February 2, 2022, 9:57pm

Hi there, just a quick update, finally:

I’ve done a full disk check of all hdd → all without any errors or inconsistencies. That’s good so far.
Therefore the issues mentioned must be linked to the instability of the hdd connection, as long as it was connected via SMB, a couple of weeks ago. This also means, most likely, some single files are corrupt > let’s see, what that brings. From what I could see, it just has a temporary impact on the suspension score, in case a file has been selected, which is broken. If it stays like this, fine for me.

So far so good, the file and / or directory issues can be explained.

Bivvo · February 3, 2022, 11:58am

Sorry that I need to bother you, guys.

I am still experiencing issues, now with the second node, which is just a couple of days old.

The excerpt of the logs is:

2022-02-03T11:43:23.378Z INFO Private server started on 127.0.0.1:7778

2022-02-03T11:43:23.378Z INFO failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details.

2022-02-03T11:43:24.669Z INFO trust Scheduling next refresh {"after": "3h34m10.368649803s"}

2022-02-03T11:43:24.671Z INFO bandwidth Performing bandwidth usage rollups

2022-02-03T11:43:28.900Z ERROR pieces:trash emptying trash failed {"error": "pieces error: filestore error: readdirent config/storage/trash/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/cx: not a directory", "errorVerbose": "pieces error: filestore error: readdirent config/storage/trash/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/cx: not a directory\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:154\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:367\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

2022-02-03T11:43:30.305Z ERROR db Unable to read the disk, please verify the disk is not corrupt

2022-02-03T11:43:30.487Z ERROR piecestore:cache error getting current used space:  {"error": "lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/aw/3uf4krdzp6liv562wzo7mpjuobihtzx4xbgs6ehv7tggmrb4gq.sj1: not a directory; readdirent config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/43: not a directory", "errorVerbose": "group:\n--- lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/aw/3uf4krdzp6liv562wzo7mpjuobihtzx4xbgs6ehv7tggmrb4gq.sj1: not a directory\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:788\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:284\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:497\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:662\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- readdirent config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/43: not a directory"}

2022-02-03T11:43:30.488Z ERROR services unexpected shutdown of a runner {"name": "piecestore:cache", "error": "lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/aw/3uf4krdzp6liv562wzo7mpjuobihtzx4xbgs6ehv7tggmrb4gq.sj1: not a directory; readdirent config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/43: not a directory", "errorVerbose": "group:\n--- lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/aw/3uf4krdzp6liv562wzo7mpjuobihtzx4xbgs6ehv7tggmrb4gq.sj1: not a directory\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:788\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:284\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:497\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:662\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- readdirent config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/43: not a directory"}

2022-02-03T11:43:30.488Z ERROR nodestats:cache Get pricing-model/join date failed {"error": "context canceled"}

Error: lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/aw/3uf4krdzp6liv562wzo7mpjuobihtzx4xbgs6ehv7tggmrb4gq.sj1: not a directory; readdirent config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/43: not a directory

2022-02-03T11:43:31.005Z INFO Configuration loaded {"Location": "/app/config/config.yaml"}

It’s another HDD and, as mentioned, without errors (fsck checked).

2022-02-03T10:00:38.185Z        ERROR   retain  retain pieces failed    {"error": "retain: context canceled", "errorVerbose": "retain: context canceled
\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:426
\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:222
\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
...
Error: readdirent config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/6m: input/output error
...
2022-02-03T11:28:14.215Z        ERROR   db      Unable to read the disk, please verify the disk is not corrupt
2022-02-03T11:28:20.785Z        ERROR   db      Unable to read the disk, please verify the disk is not corrupt

In parallel there was an issue with “no access to db” and others on the first node - a restart of docker helped / solved it for node 1. But unfortunately, node 2 is not starting anymore …

My feeling says, as the files and directories can be opened manually, that it might is an issue with docker desktop for Mac OR with the file system chosen for both HDD (Apple HFS etc.) OR with the fact, that the db files are stored on the MacOS SSD and not on the individual node-HDD.

What I can easily do is switching both nodes to the RPi and connect the Icy Dock (with external power supply of course) to it directly. But I assume, the disks won’t be possible to read with the Apple file systems there, so I need to format the disks and rsync anyway…

Pls let me think, what you think. @BrightSilence @Alexey @Stob
I am confused and lost for now.

Update 1: uh, I have an idea what can cause the trouble: time machine backups → that has run 10min before the latest issues occur, so could have affected db’s on the SSD of MacOS. I’ve disabled time machine for now. Anyway, the second node does not restart at all.

Update 2: unmounted + mounted HDD of node 2 > node is up and running again. That’s really weird.

BrightSilence · February 3, 2022, 4:14pm

Is it possible it’s pulling the AMD64 image and running under emulation? It would surprise me if that causes issues similar to using network protocols. Maybe the architecture isn’t detected correctly.

Bivvo · February 3, 2022, 5:37pm

Hmm, how to investigate that? Hmm

Alexey · February 3, 2022, 5:50pm

In the container:

docker exec -it storagenode uname -m

BrightSilence · February 3, 2022, 5:57pm

Or

docker images -a

Find storagenode image ID and

docker inspect --format='{{.Architecture}}' [ImageIDHere]