I ran through the step for How to fix a “database disk image is malformed”
Errors in the logs have subsided since then, yet my node is still suspended. What else do I need to do?
Running on Windows - no Docker.
ty
I ran through the step for How to fix a “database disk image is malformed”
Errors in the logs have subsided since then, yet my node is still suspended. What else do I need to do?
Running on Windows - no Docker.
ty
Wait.
Your node needs to have a few successful audits before it recovers. If the problem is fixed, it will get out of suspension soon.
OK, it’s been several days, but now that you mention it, I’m only seeing exclamation symbols on 2 nodes, as opposed to all of them, like before. I keep waiting to see if those clear up. Thanks.
Several days is a little long. To be sure please check whether there are failed audits in your logs.
I see the occasional “download fail”, but perhaps more concerning is this?
2020-05-09T11:15:24.303-0700 ERROR nodestats:cache Get held amount query failed {"error": "heldamount service error: protocol error: unknown rpc: \"/heldamount.HeldAmount/GetPayment\"; heldamount service error: protocol error: unknown rpc: \"/heldamount.HeldAmount/GetPayment\"; heldamount service error: protocol error: unknown rpc: \"/heldamount.HeldAmount/GetPayment\"; heldamount service error: protocol error: unknown rpc: \"/heldamount.HeldAmount/GetPayment\"", "errorVerbose": "group:\n--- heldamount service error: protocol error: unknown rpc: \"/heldamount.HeldAmount/GetPayment\"\n\tstorj.io/drpc/drpcwire.UnmarshalError:26\n\tstorj.io/drpc/drpcstream.(*Stream).HandlePacket:156\n\tstorj.io/drpc/drpcmanager.(*Manager).manageStreamPackets:313\n--- heldamount service error: protocol error: unknown rpc: \"/heldamount.HeldAmount/GetPayment\"\n\tstorj.io/drpc/drpcwire.UnmarshalError:26\n\tstorj.io/drpc/drpcstream.(*Stream).HandlePacket:156\n\tstorj.io/drpc/drpcmanager.(*Manager).manageStreamPackets:313\n--- heldamount service error: protocol error: unknown rpc: \"/heldamount.HeldAmount/GetPayment\"\n\tstorj.io/drpc/drpcwire.UnmarshalError:26\n\tstorj.io/drpc/drpcstream.(*Stream).HandlePacket:156\n\tstorj.io/drpc/drpcmanager.(*Manager).manageStreamPackets:313\n--- heldamount service error: protocol error: unknown rpc: \"/heldamount.HeldAmount/GetPayment\"\n\tstorj.io/drpc/drpcwire.UnmarshalError:26\n\tstorj.io/drpc/drpcstream.(*Stream).HandlePacket:156\n\tstorj.io/drpc/drpcmanager.(*Manager).manageStreamPackets:313"}
That was reported elsewhere. It’s because a satellite hasn’t been updated. It’s not an issue on your end.
Cool. Well, I just give it a bit longer and see how this shakes out.
Only suspended on one node now
Still suspended on saltlake, even after upgrade to 1.4.2
What the GET_AUDIT
and failed
errors?
Upload failed:
2020-05-14T01:01:07.899-0700 ERROR piecestore upload failed {"Piece ID": "T7TKZCR32YIHNC3RIBG3EO7X2UIHKBIQCKTRH5HIMQVQ7TNEUJCQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT", "error": "unexpected EOF", "errorVerbose": "unexpected EOF\n\tstorj.io/common/rpc/rpcstatus.Error:95\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doUpload:365\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Upload:216\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:987\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
Download failed
2020-05-14T07:49:03.210-0700 ERROR piecestore download failed {"Piece ID": "RL7YM2YAY6YPYPADMPUENLSQAXIIDU5A2ACXZQZSLJR3D34XIZEQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET", "error": "tls: use of closed connection", "errorVerbose": "tls: use of closed connection\n\tstorj.io/drpc/drpcstream.(*Stream).RawFlush:287\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:321\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1080\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload.func5.1:640\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22"}
A handful of each since the upgrade.
Please, filter only GET_AUDIT and failed in the same time. All other will not suspend your node
Ah, sorry. There is nothing since the 7th. That was before I repaired the userserialsdb.
Then you’ll be fine eventually. The score probably just dropped quite low on that satellite and needs a little more time to recover.
Yay, it finally cleared up. Thanks folks.
How can I do that in Powershell?
sls GET_AUDIT "C:\Program Files\Storj\Storage Node\storagenode.log" | sls failed