Storj node v1.99.3 keeps crashing since April 9.
OS: Ubuntu 22 (SSD)
RAM: 32 GB
Storage: 6x 6TB HDD RAID10 - Ext4 using mdadm - SATA
SMART tests on all drives show no errors and mdadm detail shows RAID is healthy. I even tried replacing a HDD with a bad sector and resynced the RAID. This rig is only running Storj, so other software causing problems is unlikely.
The node logs show several errors:
-
ERROR piecestore download failed {“Process”: “storagenode”, “Piece ID”: “V623P7SWJNHFUFWFCEP3K2SHMVCKQTE4VSEC6EE673TDINHK26WQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 0, “Size”: 9728, “Remote Address”: “79.127.223.129:35470”, “error”: “untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled”, “errorVerbose”: “untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).VerifyOrderLimitSignature:140\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:62\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:621\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:302\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
-
ERROR piecestore download failed {“Process”: “storagenode”, “Piece ID”: “WP5FEJXT2ULUSOYC6ZTWXNGHO4LXZE4WRZ4MVMOAZO4SMPAIA2CA”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 0, “Size”: 4352, “Remote Address”: “79.127.201.210:41678”, “error”: “trust: rpc: tcp connector failed: rpc: context canceled”, “errorVerbose”: “trust: rpc: tcp connector failed: rpc: context canceled\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190”}
-
ERROR piecestore error sending hash and order limit {“Process”: “storagenode”, “Piece ID”: “ELSPLWLYTQW3WNHDP2UCJD2KBB4VXZSK5RRVZPYKMW7V26G235GA”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET_REPAIR”, “Offset”: 0, “Size”: 18688, “Remote Address”: “49.13.226.201:47162”, “error”: “write tcp 172.17.0.3:28967->49.13.226.201:47162: write: broken pipe”, “errorVerbose”: “write tcp 172.17.0.3:28967->49.13.226.201:47162: write: broken pipe\n\tstorj.io/drpc/drpcstream.(*Stream).rawFlushLocked:401\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:462\n\tstorj.io/common/pb.(*drpcPiecestore_DownloadStream).Send:408\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func5:720\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22”}
-
ERROR piecestore download failed {“Process”: “storagenode”, “Piece ID”: “ELSPLWLYTQW3WNHDP2UCJD2KBB4VXZSK5RRVZPYKMW7V26G235GA”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET_REPAIR”, “Offset”: 0, “Size”: 18688, “Remote Address”: “49.13.226.201:47162”, “error”: “write tcp 172.17.0.3:28967->49.13.226.201:47162: write: broken pipe”, “errorVerbose”: “write tcp 172.17.0.3:28967->49.13.226.201:47162: write: broken pipe\n\tstorj.io/drpc/drpcstream.(*Stream).rawFlushLocked:401\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:462\n\tstorj.io/common/pb.(*drpcPiecestore_DownloadStream).Send:408\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func5:720\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22”}
-
ERROR piecestore upload failed {“Process”: “storagenode”, “Piece ID”: “LNLE5DFR4QEWD4623AMCISYUKIJPLHGIM7XTFXRLWCJUCHF2RILA”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “Remote Address”: “79.127.205.227:37068”, “Size”: 1900544, “error”: “manager closed: unexpected EOF”, “errorVerbose”: “manager closed: unexpected EOF\n\tgithub.com/jtolio/noiseconn.(*Conn).readMsg:225\n\tgithub.com/jtolio/noiseconn.(*Conn).Read:171\n\tstorj.io/drpc/drpcwire.(*Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:226”}
-
ERROR services unexpected shutdown of a runner {“Process”: “storagenode”, “name”: “piecestore:monitor”, “error”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory”, “errorVerbose”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
-
ERROR pieces failed to lazywalk space used by satellite {“Process”: “storagenode”, “error”: “lazyfilewalker: signal: killed”, “errorVerbose”: “lazyfilewalker: signal: killed\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:83\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:105\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:718\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”}
-
ERROR piecestore upload failed {“Process”: “storagenode”, “Piece ID”: “J7EY3VSVH5Q37EQG4XMX6XKWD464H4ZGZRNTW3QB4WSZWR7IAO2A”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “PUT”, “Remote Address”: “79.127.226.98:54252”, “Size”: 0, “error”: “order: grace period passed for order limit”, “errorVerbose”: “order: grace period passed for order limit\n\tstorj.io/storj/storagenode/orders.(*FileStore).BeginEnqueue:86\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder:885\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:410\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:294\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}