Примерно 3 недели на одном из узлов наблюдаю следующую картину:
объём USED
перетекает в TRASH
и количество TRASH
растет.
Регресс существенный, узел держал 10,9ТБ данных, TRASH
не превышал 50ГБ.
Текущая картина на скриншоте:
В логах много ошибок, если сгруппировать, то в основном эти:
ERROR piecestore failed to add bandwidth usage {"process": "storagenode", "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:855\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:530\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
ERROR piecestore failed to add bandwidth usage {"process": "storagenode", "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:855\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func7:766\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:783\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
ERROR piecestore upload failed {"process": "storagenode", "Piece ID": "HTBHJGSSCDIM62HRB4V5YCMPQNHIVS22NMTEQYLN7FDLBI52WLGA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "error": "context canceled", "errorVerbose": "context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:504\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35", "Size": 18688, "Remote Address": "172.17.0.1:50802"}
ERROR piecestore failed to add bandwidth usage {"process": "storagenode", "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:855\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func7:766\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:783\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
ERROR piecestore failed to add bandwidth usage {"process": "storagenode", "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:855\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:505\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
ERROR piecedeleter could not send delete piece to trash {"process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "HQUVRYWMVRYSLMFNNJ2G54OYHVMBLXFVOFABSJBOGTEKYSMKCYQA", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat:110\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:245\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Trash:290\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:396\n\tstorj.io/storj/storagenode/pieces.(*Deleter).deleteOrTrash:185\n\tstorj.io/storj/storagenode/pieces.(*Deleter).work:135\n\tstorj.io/storj/storagenode/pieces.(*Deleter).Run.func1:72\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75"}
FATAL Unrecoverable error {"process": "storagenode", "error": "Error creating tables for master database on storagenode: migrate: database: info opening file \"config/storage/info.db\" failed: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:347\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).Migration.func1:657\n\tstorj.io/storj/private/migrate.(*Migration).Run:164\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).MigrateToLatest:370\n\tmain.cmdRun:95\n\tmain.newRunCmd.func1:32\n\tstorj.io/private/process.cleanup.func1.4:399\n\tstorj.io/private/process.cleanup.func1:417\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomOptions:113\n\tmain.main:30\n\truntime.main:250", "errorVerbose": "Error creating tables for master database on storagenode: migrate: database: info opening file \"config/storage/info.db\" failed: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:347\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).Migration.func1:657\n\tstorj.io/storj/private/migrate.(*Migration).Run:164\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).MigrateToLatest:370\n\tmain.cmdRun:95\n\tmain.newRunCmd.func1:32\n\tstorj.io/private/process.cleanup.func1.4:399\n\tstorj.io/private/process.cleanup.func1:417\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomOptions:113\n\tmain.main:30\n\truntime.main:250\n\tmain.cmdRun:97\n\tmain.newRunCmd.func1:32\n\tstorj.io/private/process.cleanup.func1.4:399\n\tstorj.io/private/process.cleanup.func1:417\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomOptions:113\n\tma ...
FATAL Unrecoverable error {"process": "storagenode", "error": "system clock is out of sync: system clock is out of sync with all trusted satellites", "errorVerbose": "system clock is out of sync: system clock is out of sync with all trusted satellites\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check:96\n\tstorj.io/storj/storagenode.(*Peer).Run:905\n\tmain.cmdRun:120\n\tmain.newRunCmd.func1:32\n\tstorj.io/private/process.cleanup.func1.4:399\n\tstorj.io/private/process.cleanup.func1:417\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomOptions:113\n\tmain.main:30\n\truntime.main:250"}
FATAL Unrecoverable error {"process": "storagenode", "error": "Error during preflight check for storagenode databases: preflight: database: \"storage_usage\": failed inserting test value: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).preflight:451\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).Preflight:376\n\tmain.cmdRun:110\n\tmain.newRunCmd.func1:32\n\tstorj.io/private/process.cleanup.func1.4:399\n\tstorj.io/private/process.cleanup.func1:417\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomOptions:113\n\tmain.main:30\n\truntime.main:250", "errorVerbose": "Error during preflight check for storagenode databases: preflight: database: \"storage_usage\": failed inserting test value: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).preflight:451\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).Preflight:376\n\tmain.cmdRun:110\n\tmain.newRunCmd.func1:32\n\tstorj.io/private/process.cleanup.func1.4:399\n\tstorj.io/private/process.cleanup.func1:417\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomOptions:113\n\tmain.main:30\n\truntime.main:250\n\tmain.cmdRun:112\n\tmain.newRunCmd.func1:32\n\tstorj.io/private/process.cleanup.func1.4:399\n\tstorj.io/private/process.cleanup.func1:417\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomOptions:113\n\tmain.main:30\n\truntime.main:250"}
ERROR piecestore download failed {"process": "storagenode", "Piece ID": "UH6QXJ6RVTKVRJ756TD3I3LBCPFGYX2QBT3Z2JPVTFPLH7CPOOCQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "Remote Address": "172.17.0.1:45938", "error": "untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled", "errorVerbose": "untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).VerifyOrderLimitSignature:140\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:62\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:604\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
ERROR piecestore upload failed {"process": "storagenode", "Piece ID": "Z5ID3XHJMEDMRA7X7B7OMM3VCUT7UG3AZDNFM5T4DPHCKT374ZHA", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "PUT", "error": "manager closed: unexpected EOF", "errorVerbose": "manager closed: unexpected EOF\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:231", "Size": 163840, "Remote Address": "172.17.0.1:37886"}
ERROR piecestore upload failed {"process": "storagenode", "Piece ID": "JH43OV23P2VDUQOZIPXE73GPTVMAODVV2GUWIWIW6AFRJ4WVXESQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "error": "context canceled", "errorVerbose": "context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:529\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35", "Size": 1042688, "Remote Address": "172.17.0.1:49190"}
WARN console:service unable to get Satellite URL {"process": "storagenode", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "error": "console: trust: satellite \"118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW\" is untrusted", "errorVerbose": "console: trust: satellite \"118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW\" is untrusted\n\tstorj.io/storj/storagenode/trust.(*Pool).getInfo:239\n\tstorj.io/storj/storagenode/trust.(*Pool).GetNodeURL:178\n\tstorj.io/storj/storagenode/console.(*Service).GetDashboardData:188\n\tstorj.io/storj/storagenode/console/consoleapi.(*StorageNode).StorageNode:45\n\tnet/http.HandlerFunc.ServeHTTP:2122\n\tgithub.com/gorilla/mux.(*Router).ServeHTTP:210\n\tnet/http.serverHandler.ServeHTTP:2936\n\tnet/http.(*conn).serve:1995"}
ERROR piecestore upload failed {"process": "storagenode", "Piece ID": "EBWNBLXVPPYMD6Z4N3TS523CNWXIINZJDFG6MHRJZ3ND7EMSXZ3A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "error": "context canceled", "errorVerbose": "context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:504\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35", "Size": 1536, "Remote Address": "172.17.0.1:50834"}
Изменений в машине-носителе и в сети за это время не было.
Остальные 3 узла работают в штатном режиме.
Смутили FATAL
ошибки, но что с ними делать - не ясно.
Узел пока приостановил.
Что можно предпринять?