Node disqualification looking for some guidance around it

Heya,

I am new storage operator and started my node 3 days ago.
So far it was all good but today I received a notification:

Your node has been disqualified on 118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW . If you have any questions regarding this please check our Node Operators thread on Storj forum.

I checked the thread linked but the error message did not match what I saw in the docker logs.
The message from the thread

Download Errors:
2019-08-29T15:54:15.647Z INFO piecestore download failed {"Piece ID": "AXNYNZLQSU6FTH55AJPWK34BQCFDWG5EWFBPTNOVLOHA2KVXUT4Q", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET", "error": "piecestore: piecestore protocol: rpc error: code = Unavailable desc = transport is closing", "errorVerbose": "piecestore: piecestore protocol: rpc error: code = Unavailable desc = transport is closing\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func3:504\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2019-12-15T19:56:00.530Z        INFO    piecestore      download failed {"Piece ID": "25JGFFHGSZBEHEQMTYZ5QUWIAVCN5CCHDGX5O2VGJ7BXFQAK66IQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET", "error": "piecestore: piecestore protocol: write tcp 172.17.0.2:28967->[redacted]:36716: use of closed network connection", "errorVerbose": "piecestore: piecestore protocol: write tcp 172.17.0.2:28967->[redacted]:36716: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:189\n\tstorj.io/drpc/drpcwire.SplitN:25\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:233\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:266\n\tstorj.io/storj/pkg/pb.(*drpcPiecestoreDownloadStream).Send:1078\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload.func3:598\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

Following is the error message from my node:

2020-04-17T20:27:06.287Z	INFO	piecestore	upload started	{"Piece ID": "52ACPWLOYZW5N6LAXU4BYCMOVEBRMX4U4YCHURY45FG6WM4CDDCA", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "PUT", "Available Space": 475678746624}
2020-04-17T20:27:06.887Z	INFO	piecestore	upload canceled	{"Piece ID": "IOSSHQUY3KIKME4KJJFA45F3Z7YZ43UT5QCFIF3HXH775SKGXV4Q", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "PUT", "error": "context canceled", "errorVerbose": "context canceled\n\tstorj.io/common/pb/pbgrpc.init.0.func3:70\n\tstorj.io/common/rpc/rpcstatus.Wrap:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doUpload:362\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Upload:215\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:987\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:105\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:56\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:93\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}

unsure what I need to do to resolve this?
Can someone help me with this? Thank you.

This may not be related to your issue so let’s find out what caused the DQ (disqualification).

This is common when your node lost the race for that piece and it won’t DQ your node.

Your node has failed audits and you need to check your log for the same. Check your log for failed and GET_AUDIT. This log entry will have the error message shown for failed audit.

@nerdatwork Thank you for your comment.

I searched the logs and following are the 3 messages that I see:

2020-04-18T22:28:25.735Z	ERROR	piecestore	download failed	{"Piece ID": "JDIOX2CHOKZC4EZFYAFUM3P3NOX4QNOFUE2LBA62OJCAZKKVKJUQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET_AUDIT", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/pb/pbgrpc.init.0.func3:70\n\tstorj.io/common/rpc/rpcstatus.Wrap:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:559\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:466\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:105\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:56\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:93\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2020-04-20T17:28:41.723Z	ERROR	piecestore	download failed	{"Piece ID": "ZYRAELH7TYAXDRB6CNCKRNJDIIL3L6HOTJY6YUQYEZQSZCHAVSKA", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET_AUDIT", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/pb/pbgrpc.init.0.func3:70\n\tstorj.io/common/rpc/rpcstatus.Wrap:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:559\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:466\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:105\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:56\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:93\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2020-04-21T07:54:01.788Z	ERROR	piecestore	download failed	{"Piece ID": "5HQJX342K7JDN4VEOQMQFNHJ4VYYDMZHLHQ2CHFNH2GEUMRXJFQQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET_AUDIT", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/pb/pbgrpc.init.0.func3:70\n\tstorj.io/common/rpc/rpcstatus.Wrap:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:559\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:466\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:105\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:56\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:93\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}

I remember at one time before 72 hours my node was unreachable due to firewall config.
Do you think that could have caused this missing file error?

One more observation - only one satellite has disqualified the node.

How do I re-sync to get the node to rectify the error?

Thanks again for looking at this.

You have missing files. How is your HDD connected ?

Its static mounted as instructed in the docs.
I have entry in the /etc/fstab. Only one HDD, ext4 file system.

Did you keep 10% space as overhead?

Yes its 1 TB hard drive thats mounted, but storage node is configured to use 500 GB.

Right now its 425 mb free.

docker run -d
--restart unless-stopped
--stop-timeout 300 
-p <>:<>
-p <>:<>
-e WALLET="0x...."
-e EMAIL="...."
-e ADDRESS="...."
-e STORAGE="500GB"
--mount type=bind,source="<identity-path>",destination=/app/identity
--mount type=bind,source="/mnt/storage-hdd/storj",destination=/app/config
--name storagenode storjlabs/storagenode:beta

Posting the docker command used to start the storage node if it helps.

Disk usage:

/dev/sdc1      960349208 489167976 422328448  54% /mnt/storage-hdd

Once your node is DQed on that satellite it can’t be reinstated. Check your log for those piece ids. Let’s see what it shows.

I am not sure if I am looking at all the logs but this is what I get for the piece Ids mentioned above:

2020-04-18T22:28:25.682Z	INFO	piecestore	download started	{"Piece ID": "JDIOX2CHOKZC4EZFYAFUM3P3NOX4QNOFUE2LBA62OJCAZKKVKJUQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET_AUDIT"}
2020-04-18T22:28:25.735Z	ERROR	piecestore	download failed	{"Piece ID": "JDIOX2CHOKZC4EZFYAFUM3P3NOX4QNOFUE2LBA62OJCAZKKVKJUQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET_AUDIT", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/pb/pbgrpc.init.0.func3:70\n\tstorj.io/common/rpc/rpcstatus.Wrap:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:559\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:466\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:105\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:56\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:93\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2020-04-20T17:28:41.657Z	INFO	piecestore	download started	{"Piece ID": "ZYRAELH7TYAXDRB6CNCKRNJDIIL3L6HOTJY6YUQYEZQSZCHAVSKA", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET_AUDIT"}
2020-04-20T17:28:41.723Z	ERROR	piecestore	download failed	{"Piece ID": "ZYRAELH7TYAXDRB6CNCKRNJDIIL3L6HOTJY6YUQYEZQSZCHAVSKA", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET_AUDIT", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/pb/pbgrpc.init.0.func3:70\n\tstorj.io/common/rpc/rpcstatus.Wrap:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:559\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:466\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:105\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:56\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:93\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2020-04-21T07:54:01.785Z	INFO	piecestore	download started	{"Piece ID": "5HQJX342K7JDN4VEOQMQFNHJ4VYYDMZHLHQ2CHFNH2GEUMRXJFQQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET_AUDIT"}
2020-04-21T07:54:01.788Z	ERROR	piecestore	download failed	{"Piece ID": "5HQJX342K7JDN4VEOQMQFNHJ4VYYDMZHLHQ2CHFNH2GEUMRXJFQQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET_AUDIT", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/pb/pbgrpc.init.0.func3:70\n\tstorj.io/common/rpc/rpcstatus.Wrap:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:559\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:466\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:105\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:56\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:93\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}

I do not see any logs that are trying to store these pieces though.

This is why you should redirect log to file so you can investigate such issues.

When you restart docker the logs are destroyed but when redirected they stay in a file on your HDD.

2 Likes

Thank you very much for the link.
I did not pay attention to the logs just assumed it must be persistent somewhere.

I will redirect the logs.
I will go through the FAQ to see if I am missing other such things.

I see that we cannot do anything once the node is DQ.

  • Does this also impact the payout from other satellites?
  • Do I need to create the node again or let it work with this state?

Thanks for the advise so far, its been really helpful.

1 Like

No. You will still get paid from other satellites.

How old is your node ?

I started the node 3 days ago, so not that old.

The satellite 118... will be decommissioned soon and there’s no set date. You should ideally start a new node when your node is DQed from all satellites.

1 Like

Hi @cryptor, have you seen this thread ?

Suspension mode and disqualification emails

Blueprint: Downtime Disqualification

I also asked our oncall engineer to hop in and clarify further when they are online. But those 2 posts are a great place to start

1 Like