Disqualification & Suspension

eumaios · September 30, 2023, 6:10pm

Hello: I struggled at first to get my node working, so there were (obviously) down-times. But everything has been smooth all month of September.

Yet, without any “suspension” warnings, today I received a Disqualification email from AP1 satellite. Should I expect more disqualification emails from the other satellites? Do I need to be on all satellites?

Also, my Dashboard has no such notices.

I don’t wish to complain (though quite upset after all my work), but rather understand more clearly what paths (if any) are now open to me.

Thank you.

jammerdan · September 30, 2023, 6:25pm

Your online score is not the problem the audit scores are.
That means your node has lost data or at least did not respond timely to audit requests.

This is the issue you have to solve.

eumaios · September 30, 2023, 6:28pm

What are the main causes of those two effects? And by your comment, I am to assume it is not too late to fix this?

Thank you.

jammerdan · September 30, 2023, 6:38pm

Nobody can tell if it is too late. If your node has already lost too much data then the probability for being disqualified is high. Also if you node keeps losing data.

You need to check with in your logs for audit errors. If you want advice from users here you should post the error message and tell more about your system and what problems you had before. This could be related to the data loss that seems to be happening on your node.

eumaios · September 30, 2023, 7:37pm

Which of the three logs would be most useful to post: stor, generate-identity or setup?
“storj” seems to have nothing out of the ordinary.
“generate-identity” has nothing logged.
“setup” has no logs since 8-31.
Shall I post the last? I’m new to all this, but eager to learn and work hard to do so.
Thank you.

jammerdan · September 30, 2023, 7:53pm

I don’t know the logs you are referring to.

Please check here:

eumaios · September 30, 2023, 8:12pm

I’m running this node on a TrueNAS system. I tried the commands you mention without success. The logs I mentioned are available in TrueNAS app. Otherwise, maybe I can access these logs from shell? I also have access to that. It seems that this TrueNAS build is not the best (ran into problems at setup too).

Alexey · October 1, 2023, 1:34am

You can access logs from the shell:

Open a shell from System settings
List containers

sudo docker ps

Search for container, contained storj in its name, copy its container ID
Filter logs

sudo docker logs c55ff406647a 2>&1 | grep GET_AUDIT | grep failed

However, since you didn’t redirect logs to the file, they can be lost if the container was re-created.
You may also use the web UI to check logs, you need to check the storj container, but it’s not convenient to search for failed audit in web UI. You may also download logs and search locally in the log file.

eumaios · October 1, 2023, 10:53am

Alexey:

Thank you.

I did have my node configured with a working email. I’ve not received any emails for being offline in the last month; and I’ve never received any suspension warnings. LKastly, I checked in on my node daily; it was always Status: Online QUIC OK (though yesterday, it did say it was only Online for 8 hours, which makes no sense, since I received no emails).

The log output from your commands is as follows:

2023-09-30T11:23:33Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “IMTCD6B4FKWQIIIUST3NNKSZYHOEDKD7I6T7LYRBADANHOTUQT4A”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET_AUDIT”, “Offset”: 8448, “Size”: 0, “Remote Address”: “34.146.139.227:58876”, “error”: “file does not exist”, “errorVerbose”: “file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:671\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2023-09-30T13:09:17Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “Y6CK5XLVCEAOUFXIYQFJBXYC7ZM5HF3HCLAB4LF64Q6MKWQTGORQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET_AUDIT”, “Offset”: 1354496, “Size”: 0, “Remote Address”: “34.148.62.121:55084”, “error”: “file does not exist”, “errorVerbose”: “file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:671\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2023-09-30T16:46:03Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “6QTWSBJFU42FKD4S52KXI5JG4P2M5SOD5YWXH7VBJXH7ICS6GCLA”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET_AUDIT”, “Offset”: 884224, “Size”: 0, “Remote Address”: “34.146.139.227:36776”, “error”: “file does not exist”, “errorVerbose”: “file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:671\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}

jammerdan · October 1, 2023, 12:03pm

The bold parts are the bad parts.
It means that the satellite has sent audit requests to your node to verify that your node has the data pieces that the satellite expects it to hold.
But your node responded that it does not have the requested files at all. That is the worst possible scenario.
So either they have been deleted or the node software does not have access to the correct location.
Depending on how much data you have already lost, this means that your node may get disqualified.
By the satellite id you can see that your node has lost data for different satellites.
You need to check what was/is the cause for the data loss. My suggestion first of all would be to take the node offline but keeping it running but cutting it off from the internet. Why? Because it can be offline for 30 days but if it remains online and keeps getting audited for lost pieces it might get disqualified from all satellites much faster and maybe even sooner than you have been able to diagnose and fix the underlying issue.

eumaios · October 1, 2023, 12:51pm

OK, thank you, good suggestion taking node offline. It’s really stressful when the proverbial clock is ticking. I simply removed the port forwarding.

So I have been reading about various reasons for this situation, and they seem abundant. And yet, none that I have read so far seem applicable/likely.

The drives are all healthy, according to TrueNAS. And they are top quality, nearly new HDDs (about four months old).

I have made no changes that could either screw up permissions or change dir locations, since initial setup.

I only have this one node, so it isn’t two nodes with same identity.

If relevant, I have this node setup with ZFS RAIDZ1 configuration.

Next steps?

jammerdan · October 1, 2023, 1:19pm

I would pick a piece id which has been logged as failed audit and search for it in the logs.
Then you can check if something had been logged about that file. Like what happened when it got uploaded and maybe some other additional log messages.

Second I would check in the storage folder if the piece is truly not there.
From the satellite id you can get the storage blobs folder: Satellite info (Address, ID, Blobs folder, Hex).
The first 2 characters of the piece id is the name of the folder. Check the storage and the trash folder if the piece is there.

If it has been successfully uploaded and it’s not there, then you have a problem. And if it is there, you have another problem.
Maybe it is a file system problem and you need to run a file system check and repair.

eumaios · October 1, 2023, 1:44pm

OK, excellent. I will look into these matters. Quick follow-up: is it sufficient to remove port forwarding? The dashboard still shows the node online, though QUIC as misconfigured.

eumaios · October 1, 2023, 2:16pm

So searching the logs for other errors with the problematic piece IDs, I find the following (as an example). Is it of any use? Not able to make any sense of it, beyond it confirming the piece does not exist.

2023-09-30T16:46:03Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “6QTWSBJFU42FKD4S52KXI5JG4P2M5SOD5YWXH7VBJXH7ICS6GCLA”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET_AUDIT”, “Offset”: 884224, “Size”: 0, “Remote Address”: “34.146.139.227:36776”, “error”: “file does not exist”, “errorVerbose”: “file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:671\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}

jammerdan · October 1, 2023, 2:30pm

That is just the message about the failed audit again.
Maybe the upload messages concerning these specific pieces are no longer present in the log as it gets deleted normally when you remove the container.

But as said, additionally you have to check in the storage and trash folders if the piece is there or not.

eumaios · October 1, 2023, 3:10pm

Would it be reasonable to assume that, at this point, knowing Docker is necessary to any possible troubleshooting here? I had planned to slowly learn Docker over the next year, but maybe I jumped the gun (e.g., I can’t even find the trash!). I discovered the Storj app in TrueNAS charts, loved the project, so I gave it a shot. But I just started studying Linux less than a year ago for fun.

I’d be open to a paid tutor, if anyone’s interested

nerdatwork · October 1, 2023, 3:19pm

You can go through Storj’s docs at docs.storj.io and learn whatever you need or ask in the forum.

eumaios · October 1, 2023, 3:24pm

I searched the docs for how to find the trash, but nothing (I’m guessing that is just obvious to most everyone).

Well, if people are willing to help with my oh, so basic questions: how do I find the trash in the storj docker container?

jammerdan · October 1, 2023, 3:31pm

Unfortunately I am not on TrueNAS so I am not familiar with how the node setup looks on such a device or where to look for the underlying folders.
But I am sure others will be able to tell you exactly where you need to look for them.

eumaios · October 1, 2023, 3:37pm

I understand. It’s a been a bit of a pickle: those in the TrueNAS Discord room, don’t know Storj, and those here don’t know TrueNAS well, though Alexey seems generally familiar with it.