Node still suspended after db repair

I ran through the step for How to fix a “database disk image is malformed”

Errors in the logs have subsided since then, yet my node is still suspended. What else do I need to do?

Running on Windows - no Docker.

ty

Wait.

Your node needs to have a few successful audits before it recovers. If the problem is fixed, it will get out of suspension soon.

2 Likes

OK, it’s been several days, but now that you mention it, I’m only seeing exclamation symbols on 2 nodes, as opposed to all of them, like before. I keep waiting to see if those clear up. Thanks.

Several days is a little long. To be sure please check whether there are failed audits in your logs.

I see the occasional “download fail”, but perhaps more concerning is this?

2020-05-09T11:15:24.303-0700 ERROR nodestats:cache Get held amount query failed {"error": "heldamount service error: protocol error: unknown rpc: \"/heldamount.HeldAmount/GetPayment\"; heldamount service error: protocol error: unknown rpc: \"/heldamount.HeldAmount/GetPayment\"; heldamount service error: protocol error: unknown rpc: \"/heldamount.HeldAmount/GetPayment\"; heldamount service error: protocol error: unknown rpc: \"/heldamount.HeldAmount/GetPayment\"", "errorVerbose": "group:\n--- heldamount service error: protocol error: unknown rpc: \"/heldamount.HeldAmount/GetPayment\"\n\tstorj.io/drpc/drpcwire.UnmarshalError:26\n\tstorj.io/drpc/drpcstream.(*Stream).HandlePacket:156\n\tstorj.io/drpc/drpcmanager.(*Manager).manageStreamPackets:313\n--- heldamount service error: protocol error: unknown rpc: \"/heldamount.HeldAmount/GetPayment\"\n\tstorj.io/drpc/drpcwire.UnmarshalError:26\n\tstorj.io/drpc/drpcstream.(*Stream).HandlePacket:156\n\tstorj.io/drpc/drpcmanager.(*Manager).manageStreamPackets:313\n--- heldamount service error: protocol error: unknown rpc: \"/heldamount.HeldAmount/GetPayment\"\n\tstorj.io/drpc/drpcwire.UnmarshalError:26\n\tstorj.io/drpc/drpcstream.(*Stream).HandlePacket:156\n\tstorj.io/drpc/drpcmanager.(*Manager).manageStreamPackets:313\n--- heldamount service error: protocol error: unknown rpc: \"/heldamount.HeldAmount/GetPayment\"\n\tstorj.io/drpc/drpcwire.UnmarshalError:26\n\tstorj.io/drpc/drpcstream.(*Stream).HandlePacket:156\n\tstorj.io/drpc/drpcmanager.(*Manager).manageStreamPackets:313"}

That was reported elsewhere. It’s because a satellite hasn’t been updated. It’s not an issue on your end.

Cool. Well, I just give it a bit longer and see how this shakes out.

Only suspended on one node now :sweat_smile:

1 Like

Still suspended on saltlake, even after upgrade to 1.4.2 :thinking:

What the GET_AUDIT and failed errors?

Upload failed:
2020-05-14T01:01:07.899-0700 ERROR piecestore upload failed {"Piece ID": "T7TKZCR32YIHNC3RIBG3EO7X2UIHKBIQCKTRH5HIMQVQ7TNEUJCQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT", "error": "unexpected EOF", "errorVerbose": "unexpected EOF\n\tstorj.io/common/rpc/rpcstatus.Error:95\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doUpload:365\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Upload:216\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:987\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}

Download failed
2020-05-14T07:49:03.210-0700 ERROR piecestore download failed {"Piece ID": "RL7YM2YAY6YPYPADMPUENLSQAXIIDU5A2ACXZQZSLJR3D34XIZEQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET", "error": "tls: use of closed connection", "errorVerbose": "tls: use of closed connection\n\tstorj.io/drpc/drpcstream.(*Stream).RawFlush:287\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:321\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1080\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload.func5.1:640\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22"}

A handful of each since the upgrade.

Please, filter only GET_AUDIT and failed in the same time. All other will not suspend your node

Ah, sorry. There is nothing since the 7th. That was before I repaired the userserialsdb.

Then you’ll be fine eventually. The score probably just dropped quite low on that satellite and needs a little more time to recover.

1 Like

Yay, it finally cleared up. Thanks folks.

1 Like

How can I do that in Powershell?

sls GET_AUDIT "C:\Program Files\Storj\Storage Node\storagenode.log" | sls failed
1 Like