Database locked,node crash

HI, got hit with this at night. node restarted as configured. with no error. v1.79.4
"
2023-06-16T00:08:36.002+0200 ERROR piecestore upload failed {“Piece ID”: “2IP42YCMDDYPF3HWYY6X3CNX2KKOIPC7BBV2OESEKWD6PFOE2BFQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “error”: “context canceled”, “errorVerbose”: “context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:504\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”, “Size”: 3328, “Remote Address”: “5.161.149.40:61930”}
2023-06-16T00:08:36.002+0200 ERROR piecestore upload failed {“Piece ID”: “XMP2IT3GPXECEI45KQIGITWMMRPSKMLV2U74BWQBN5EEETGN4BEQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “error”: “context canceled”, “errorVerbose”: “context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:504\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”, “Size”: 768, “Remote Address”: “5.161.128.79:21052”}
2023-06-16T00:08:36.079+0200 FATAL Unrecoverable error {“error”: “satellitesdb: database is locked”, “errorVerbose”: "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).SetAddress:40\n\tstorj.io/storj/storagenode/trust.(*Pool).Run:125\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75"}
2023-06-16T00:23:47.922+0200 ERROR piecestore download failed {“Piece ID”: “36RKAVM2GNZDA7X7O4MT7ZRPNQFVPJ2JPKN7EGT5WA2CGBBXDBCA”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Remote Address”: “5.161.110.70:25980”, “error”: “untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled”, “errorVerbose”: “untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).VerifyOrderLimitSignature:140\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:62\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:604\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2023-06-16T00:23:47.922+0200 ERROR piecestore download failed {“Piece ID”: “4TJ5DWMQPZMGQ4TA7ECOGPZF5BJKB6S4VAOTTQP4EV73I5SAMGPQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Remote Address”: “38.88.241.42:38568”, “error”: “untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled”, “errorVerbose”: "untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).VerifyOrderLimitSignature:140\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:62\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:604\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
2023-06-16T00:23:57.040+0200 ERROR piecestore upload failed {“Piece ID”: “VOQU6PDMLDPE7O2CUKPJJTVTCCVVGZEPKUCU3G42KZKZRKPO432Q”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “error”: “context canceled”, “errorVerbose”: “context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:504\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”, “Size”: 36864, “Remote Address”: “5.161.218.238:23128”}
Address": “38.104.84.242:61062”}
"

Already rebooted the whole system?
Sure there’s running only one node?
(Since database has been locked…)

Only one node. Definitely.
Its configured to start again 23 min after fatal.
Uptimerobot scans all 21 min. So i get noticed After crash and restart. No restart of the pc so far.
Service started without errors after 23min.

This happened to me when the drive couldn’t catch up, especially during a heavy IO. After I have moved the databases to SSD I no longer have this problem.

If the system is being backed by a UPS (and even when it isn’t, with risk of some data loss), you could consider to set commit time to handle the IO.

But moving database to SSD or something, could indeed be a better option.

But restarting the PC for a real starter, would be the primary logic to me…

may be the cause. will search for howto in the forum.
thx at all

1 Like