Since a couple of weeks my nodes (2) stop or hang regularly. Sometimes every 24 hours. Starting or restarting solves the problem, but want to get more insight on why they stop/hang.
They are running on a QNAP with the latest firmware and docker version. Both nodes run each on their own USB drive.
I stopped watchtower to avoid any interference with other dockers.
Here is an example of a stop this morning:
2023-10-09T14:44:33Z INFO piecestore downloaded {“process”: “storagenode”, “Piece ID”: “UY7BOYWE4OPZXU4OPDKASPJTHT5JSENH7CBSXE5V6IGWWPFI6IZQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 0, “Size”: 1280, “Remote Address”: “184.104.224.98:34524”}
2023-10-09T14:44:34Z INFO Download finished. {“Process”: “storagenode-updater”, “From”: “https://github.com/storj/storj/releases/download/v1.88.3/storagenode_linux_amd64.zip”, “To”: “/tmp/storagenode_linux_amd64.2184206584.zip”}
2023-10-09T14:44:34Z INFO Restarting service. {“Process”: “storagenode-updater”, “Service”: “storagenode”}
2023-10-09T14:44:34Z INFO piecestore upload started {“process”: “storagenode”, “Piece ID”: “Y7LUWE6AHXHYUBDUNS7NCNNMIQV2ENEBNWZ2H36YYCUZBKAFGKOA”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “Available Space”: 5927494286464, “Remote Address”: “216.66.40.82:49314”}
2023-10-09T14:44:34Z INFO piecestore download started {“process”: “storagenode”, “Piece ID”: “UY7BOYWE4OPZXU4OPDKASPJTHT5JSENH7CBSXE5V6IGWWPFI6IZQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 0, “Size”: 1280, “Remote Address”: “23.237.191.122:51590”}
2023-10-09T14:44:35Z INFO piecestore upload started {“process”: “storagenode”, “Piece ID”: “ZNZNVXOHC7XDZGHZJVRRGAG5UMYTRIBTNXYP4CZN6YHJRYBA3TRA”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “PUT”, “Available Space”: 5927494264448, “Remote Address”: “95.163.17.6:49449”}
2023-10-09T14:44:35Z INFO piecestore downloaded {“process”: “storagenode”, “Piece ID”: “UY7BOYWE4OPZXU4OPDKASPJTHT5JSENH7CBSXE5V6IGWWPFI6IZQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 0, “Size”: 1280, “Remote Address”: “23.237.191.122:51590”}
2023-10-09T14:44:35Z INFO Service restarted successfully. {“Process”: “storagenode-updater”, “Service”: “storagenode”}
2023-10-09T14:44:35Z INFO Got a signal from the OS: “interrupt” {“process”: “storagenode”}
2023-10-09T14:44:35Z INFO Current binary version {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”, “Version”: “v1.87.1”}
2023-10-09T14:44:35Z INFO Download started. {“Process”: “storagenode-updater”, “From”: “https://github.com/storj/storj/releases/download/v1.88.3/storagenode-updater_linux_amd64.zip”, “To”: “/tmp/storagenode-updater_linux_amd64.1765104221.zip”}
2023-10-09T14:44:35Z ERROR pieces:trash emptying trash failed {“process”: “storagenode”, “error”: “pieces error: filestore error: context canceled”, “errorVerbose”: “pieces error: filestore error: context canceled\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).EmptyTrash:161\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:316\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:406\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1.1:83\n\tstorj.io/common/sync2.(*Workplace).Start.func1:89”}
2023-10-09T14:44:35Z INFO lazyfilewalker.used-space-filewalker subprocess exited with status {“process”: “storagenode”, “satelliteID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “status”: -1, “error”: “signal: killed”}
2023-10-09T14:44:35Z INFO piecestore upload canceled {“process”: “storagenode”, “Piece ID”: “EBOSRSNE3ARVIBA7B4RNSKICGG2TDQULHUBOFFNNBK244RBHICOA”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “Size”: 2031616, “Remote Address”: “184.104.224.99:52482”}
2023-10-09T14:44:35Z ERROR pieces failed to lazywalk space used by satellite {“process”: “storagenode”, “error”: “lazyfilewalker: signal: killed”, “errorVerbose”: “lazyfilewalker: signal: killed\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:83\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:105\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:707\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
2023-10-09T14:44:35Z INFO lazyfilewalker.used-space-filewalker starting subprocess {“process”: “storagenode”, “satelliteID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}
2023-10-09T14:44:35Z ERROR lazyfilewalker.used-space-filewalker failed to start subprocess {“process”: “storagenode”, “satelliteID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “error”: “context canceled”}
2023-10-09T14:44:35Z ERROR pieces failed to lazywalk space used by satellite {“process”: “storagenode”, “error”: “lazyfilewalker: context canceled”, “errorVerbose”: “lazyfilewalker: context canceled\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:71\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:105\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:707\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}
2023-10-09T14:44:35Z ERROR piecestore:cache error getting current used space: {“process”: “storagenode”, “error”: “filewalker: context canceled; filewalker: context canceled”, “errorVerbose”: “group:\n— filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:69\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:74\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:716\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75\n— filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:69\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:74\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:716\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-10-09T14:44:35Z INFO piecestore upload canceled {“process”: “storagenode”, “Piece ID”: “VBMK7DD4VJ2O5JFOHVKFEMUNNFKIHMJP6GHUCTRJ2U2QBLSMJOFQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “Size”: 1703936, “Remote Address”: “23.237.191.122:45524”}
2023-10-09T14:44:35Z INFO piecestore upload canceled {“process”: “storagenode”, “Piece ID”: “Y43ZMXZURF2KF7NV4JHKXUB73N6BW25SP7BLTE3CJT7SB4RLOTIQ”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “PUT”, “Size”: 1900544, “Remote Address”: “184.104.224.98:55476”}
2023-10-09T14:44:35Z INFO piecestore upload canceled (race lost or node shutdown) {“process”: “storagenode”, “Piece ID”: “Y7LUWE6AHXHYUBDUNS7NCNNMIQV2ENEBNWZ2H36YYCUZBKAFGKOA”}
2023-10-09T14:44:35Z INFO piecestore upload canceled {“process”: “storagenode”, “Piece ID”: “5TKZO26APSYLJKXLJ2MGQFLMYDC2R74HP75IXH3DIJW6M7AOPXDQ”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “PUT”, “Size”: 794624, “Remote Address”: “157.211.109.157:45568”}
2023-10-09T14:44:35Z INFO piecestore upload canceled (race lost or node shutdown) {“process”: “storagenode”, “Piece ID”: “ZNZNVXOHC7XDZGHZJVRRGAG5UMYTRIBTNXYP4CZN6YHJRYBA3TRA”}
2023-10-09T14:44:36Z INFO Download finished. {“Process”: “storagenode-updater”, “From”: “https://github.com/storj/storj/releases/download/v1.88.3/storagenode-updater_linux_amd64.zip”, “To”: “/tmp/storagenode-updater_linux_amd64.1765104221.zip”}
2023-10-09T14:44:36Z INFO Restarting service. {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”}
2023-10-09 14:44:36,287 INFO exited: storagenode-updater (exit status 1; not expected)
2023-10-09 14:44:37,303 INFO spawned: ‘storagenode-updater’ with pid 1424
2023-10-09 14:44:37,318 WARN received SIGQUIT indicating exit request
2023-10-09 14:44:37,319 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2023-10-09 14:44:37,320 INFO stopped: storagenode-updater (terminated by SIGTERM)
2023-10-09 14:44:37,541 INFO stopped: storagenode (terminated by SIGTERM)
2023-10-09 14:44:37,542 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)