I received emails this morning indicating I was suspended from the following satellites:
us-central-1
europe-west-1
asia-east-1
The message has the following line:
You won’t receive any new data on this Satellite until you resolve the issue causing audit failures on your node.
However, there are no indications what issues needs to be fixed to address these audit failures. The message is of little value.
Regards,
Gary
Check your log for download failed
and GET_AUDIT
entries. These are audit failures.
When I run the following command it shows 50 GET_AUDIT entries:
docker logs -t storagenode 2>&1 | grep -c GET_AUDIT
When I look at these specific log entries they are all like the following:
docker logs -t storagenode 2>&1 | grep GET_AUDIT
2020-05-14T15:44:10.942145874Z 2020-05-14T15:44:10.941Z INFO piecestore download started {“Piece ID”: “NPQD2DRYPNYC7DPSIAPPJ7NTBWBKLEZGYACFSR3YJCO7DN5ITFSQ”, “Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Action”: " GET_AUDIT "}
2020-05-14T15:44:11.138385877Z 2020-05-14T15:44:11.137Z INFO piecestore downloaded {“Piece ID”: “NPQD2DRYPNYC7DPSIAPPJ7NTBWBKLEZGYACFSR3YJCO7DN5ITFSQ”, “Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Action”: " GET_AUDIT "}
2020-05-14T15:51:55.012695688Z 2020-05-14T15:51:55.011Z INFO piecestore download started {“Piece ID”: “QJNJFJKSQJZTYP3W7D64DRFPHUACZAUAZ3RREVIEJLJTD5SVA5LA”, “Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Action”: " GET_AUDIT "}
2020-05-14T15:51:55.225571664Z 2020-05-14T15:51:55.223Z INFO piecestore downloaded {“Piece ID”: “QJNJFJKSQJZTYP3W7D64DRFPHUACZAUAZ3RREVIEJLJTD5SVA5LA”, “Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Action”: " GET_AUDIT "}
These do not indicate to me that there are errors. What am I missing?
thebadcat:
What am I missing?
You are missing the second keyword. Its “download failed” AND “GET_AUDIT” together.
@nerdatwork , thx…
There are also 11 of these entries as well:
2020-05-14T16:56:41.142764658Z 2020-05-14T16:56:41.141Z ERROR piecestore download failed {“Piece ID”: “IGJOLYY3BTNAK3BNYMYYFTTDWLKNSMI3OZMRSIXAWSR3T7Q4QZFA”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Action”: “GET”, “error”: “write tcp 172.17.0.3:28967->35.236.66.70:50224: use of closed network connection”, “errorVerbose”: “write tcp 172.17.0.3:28967->35.236.66.70:50224: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:221\n\tstorj.io/drpc/drpcwire.SplitN:29\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:276\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:318\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1080\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload.func5.1:640\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22”}
What does use of closed network connection mean and how would/should I fix it? Is there some sort of timeout value needed for whatever connection this may be referring to that I can adjust?
That error is shown when timeout is hit for that operation. The TCP connection is closed and results in failed
message.
Storj is looking in to it.
Try docker logs -t storagenode 2>&1 | grep "GET_AUDIT" | grep "download failed"
@donald.m.motsinger thx… I’ve already used the following already:
docker logs -t storagenode 2>&1 | grep -E “download failed|GET_AUDIT”
it achieves the same thing… well it gives both…
No, look closer. My command gives all lines where both terms occur. Your command returns all lines where either terms occur.
@donald.m.motsinger
Yeah, you are correct, however, there are no lines that has both.
That’s a good thing. It means no audit failures.
@nerdatwork
sorry what do you mean “Storj is looking in to it”?
Is this being looked by a official Storj support person or group?
Alexey
May 14, 2020, 7:19pm
13
@Alexey
Any idea what this error could be? Are there any knobs I could set to adjust whatever is timing out?
Did you recently recreate the container? If so you may not have any logs that actually show the error at this point. Most likely it’s the database locked issue. This is something that needs to be fixed in software, but you can vacuum the usedserials.db and defrag it. For some users that eliminates the problem.
@BrightSilence
Yes, I did remove/pull/restart after receiving the emails so I suppose you are correct the real error messages have been thrown away.
Anyway, how do I do this vacuum and defrag you speak of? And after that how do I remove the suspension?
BTW, knowing that there is a know bug out there shouldn’t there be some sort of grace given before being suspended?
Alexey
May 14, 2020, 8:24pm
17
I’m running a docker version on Debian GNU/Linux.
Here’s the procedure I use to vacuum and check the databases:
Stop the node
Vacuum the dbs
integrity_check the dbs
Restart the node
Here’s a bash check which should do that for you. Please change the database directory to reflect where yours are on your system.
docker stop -t 300 storagenode &&
dbs=$(ls /opt/storj/storage/*.db)
c1="VACUUM;"
c2="PRAGMA integrity_check;"
for i in $dbs
do
sqlite3 $i "$c1"
done
for i in $dbs
do
sqlite3 $i "…
yes, the following disqualification is disabled
@Alexey
I’ve run the commands above…
How can I tell my suspension status?
Alexey
May 14, 2020, 9:03pm
19
Take a look on dashboard, it will show on which satellite (if any) it’s suspended
Please be aware it can take some time until the suspension is resolved. You need to respond successfully to a few audits before it recovers.
2 Likes