Heya,
I am new storage operator and started my node 3 days ago.
So far it was all good but today I received a notification:
Your node has been disqualified on 118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW . If you have any questions regarding this please check our Node Operators thread on Storj forum.
I checked the thread linked but the error message did not match what I saw in the docker logs.
The message from the thread
Download Errors:
2019-08-29T15:54:15.647Z INFO piecestore download failed {"Piece ID": "AXNYNZLQSU6FTH55AJPWK34BQCFDWG5EWFBPTNOVLOHA2KVXUT4Q", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET", "error": "piecestore: piecestore protocol: rpc error: code = Unavailable desc = transport is closing", "errorVerbose": "piecestore: piecestore protocol: rpc error: code = Unavailable desc = transport is closing\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func3:504\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2019-12-15T19:56:00.530Z INFO piecestore download failed {"Piece ID": "25JGFFHGSZBEHEQMTYZ5QUWIAVCN5CCHDGX5O2VGJ7BXFQAK66IQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET", "error": "piecestore: piecestore protocol: write tcp 172.17.0.2:28967->[redacted]:36716: use of closed network connection", "errorVerbose": "piecestore: piecestore protocol: write tcp 172.17.0.2:28967->[redacted]:36716: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:189\n\tstorj.io/drpc/drpcwire.SplitN:25\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:233\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:266\n\tstorj.io/storj/pkg/pb.(*drpcPiecestoreDownloadStream).Send:1078\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload.func3:598\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
Following is the error message from my node:
2020-04-17T20:27:06.287Z INFO piecestore upload started {"Piece ID": "52ACPWLOYZW5N6LAXU4BYCMOVEBRMX4U4YCHURY45FG6WM4CDDCA", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "PUT", "Available Space": 475678746624}
2020-04-17T20:27:06.887Z INFO piecestore upload canceled {"Piece ID": "IOSSHQUY3KIKME4KJJFA45F3Z7YZ43UT5QCFIF3HXH775SKGXV4Q", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "PUT", "error": "context canceled", "errorVerbose": "context canceled\n\tstorj.io/common/pb/pbgrpc.init.0.func3:70\n\tstorj.io/common/rpc/rpcstatus.Wrap:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doUpload:362\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Upload:215\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:987\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:105\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:56\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:93\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
unsure what I need to do to resolve this?
Can someone help me with this? Thank you.
This may not be related to your issue so let’s find out what caused the DQ (disqualification).
This is common when your node lost the race for that piece and it won’t DQ your node.
Your node has failed audits and you need to check your log for the same. Check your log for failed
and GET_AUDIT
. This log entry will have the error message shown for failed audit.
@nerdatwork Thank you for your comment.
I searched the logs and following are the 3 messages that I see:
2020-04-18T22:28:25.735Z ERROR piecestore download failed {"Piece ID": "JDIOX2CHOKZC4EZFYAFUM3P3NOX4QNOFUE2LBA62OJCAZKKVKJUQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET_AUDIT", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/pb/pbgrpc.init.0.func3:70\n\tstorj.io/common/rpc/rpcstatus.Wrap:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:559\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:466\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:105\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:56\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:93\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2020-04-20T17:28:41.723Z ERROR piecestore download failed {"Piece ID": "ZYRAELH7TYAXDRB6CNCKRNJDIIL3L6HOTJY6YUQYEZQSZCHAVSKA", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET_AUDIT", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/pb/pbgrpc.init.0.func3:70\n\tstorj.io/common/rpc/rpcstatus.Wrap:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:559\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:466\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:105\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:56\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:93\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2020-04-21T07:54:01.788Z ERROR piecestore download failed {"Piece ID": "5HQJX342K7JDN4VEOQMQFNHJ4VYYDMZHLHQ2CHFNH2GEUMRXJFQQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET_AUDIT", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/pb/pbgrpc.init.0.func3:70\n\tstorj.io/common/rpc/rpcstatus.Wrap:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:559\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:466\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:105\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:56\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:93\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
I remember at one time before 72 hours my node was unreachable due to firewall config.
Do you think that could have caused this missing file error?
One more observation - only one satellite has disqualified the node.
How do I re-sync to get the node to rectify the error?
Thanks again for looking at this.
You have missing files. How is your HDD connected ?
Its static mounted as instructed in the docs.
I have entry in the /etc/fstab. Only one HDD, ext4 file system.
Did you keep 10% space as overhead?
Yes its 1 TB hard drive thats mounted, but storage node is configured to use 500 GB.
Right now its 425 mb free.
docker run -d
--restart unless-stopped
--stop-timeout 300
-p <>:<>
-p <>:<>
-e WALLET="0x...."
-e EMAIL="...."
-e ADDRESS="...."
-e STORAGE="500GB"
--mount type=bind,source="<identity-path>",destination=/app/identity
--mount type=bind,source="/mnt/storage-hdd/storj",destination=/app/config
--name storagenode storjlabs/storagenode:beta
Posting the docker command used to start the storage node if it helps.
Disk usage:
/dev/sdc1 960349208 489167976 422328448 54% /mnt/storage-hdd
Once your node is DQed on that satellite it can’t be reinstated. Check your log for those piece ids. Let’s see what it shows.
I am not sure if I am looking at all the logs but this is what I get for the piece Ids mentioned above:
2020-04-18T22:28:25.682Z INFO piecestore download started {"Piece ID": "JDIOX2CHOKZC4EZFYAFUM3P3NOX4QNOFUE2LBA62OJCAZKKVKJUQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET_AUDIT"}
2020-04-18T22:28:25.735Z ERROR piecestore download failed {"Piece ID": "JDIOX2CHOKZC4EZFYAFUM3P3NOX4QNOFUE2LBA62OJCAZKKVKJUQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET_AUDIT", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/pb/pbgrpc.init.0.func3:70\n\tstorj.io/common/rpc/rpcstatus.Wrap:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:559\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:466\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:105\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:56\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:93\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2020-04-20T17:28:41.657Z INFO piecestore download started {"Piece ID": "ZYRAELH7TYAXDRB6CNCKRNJDIIL3L6HOTJY6YUQYEZQSZCHAVSKA", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET_AUDIT"}
2020-04-20T17:28:41.723Z ERROR piecestore download failed {"Piece ID": "ZYRAELH7TYAXDRB6CNCKRNJDIIL3L6HOTJY6YUQYEZQSZCHAVSKA", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET_AUDIT", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/pb/pbgrpc.init.0.func3:70\n\tstorj.io/common/rpc/rpcstatus.Wrap:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:559\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:466\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:105\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:56\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:93\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2020-04-21T07:54:01.785Z INFO piecestore download started {"Piece ID": "5HQJX342K7JDN4VEOQMQFNHJ4VYYDMZHLHQ2CHFNH2GEUMRXJFQQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET_AUDIT"}
2020-04-21T07:54:01.788Z ERROR piecestore download failed {"Piece ID": "5HQJX342K7JDN4VEOQMQFNHJ4VYYDMZHLHQ2CHFNH2GEUMRXJFQQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET_AUDIT", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/pb/pbgrpc.init.0.func3:70\n\tstorj.io/common/rpc/rpcstatus.Wrap:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:559\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:466\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:105\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:56\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:93\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
I do not see any logs that are trying to store these pieces though.
This is why you should redirect log to file so you can investigate such issues.
When you restart docker the logs are destroyed but when redirected they stay in a file on your HDD.
2 Likes
Thank you very much for the link.
I did not pay attention to the logs just assumed it must be persistent somewhere.
I will redirect the logs.
I will go through the FAQ to see if I am missing other such things.
I see that we cannot do anything once the node is DQ.
Does this also impact the payout from other satellites?
Do I need to create the node again or let it work with this state?
Thanks for the advise so far, its been really helpful.
1 Like
No. You will still get paid from other satellites.
How old is your node ?
nerdatwork:
How old is your node ?
I started the node 3 days ago, so not that old.
The satellite 118...
will be decommissioned soon and there’s no set date. You should ideally start a new node when your node is DQed from all satellites.
1 Like
Hi @cryptor , have you seen this thread ?
Suspension mode and disqualification emails
Blueprint: Downtime Disqualification
I also asked our oncall engineer to hop in and clarify further when they are online. But those 2 posts are a great place to start
1 Like