Hello: I struggled at first to get my node working, so there were (obviously) down-times. But everything has been smooth all month of September.
Yet, without any “suspension” warnings, today I received a Disqualification email from AP1 satellite. Should I expect more disqualification emails from the other satellites? Do I need to be on all satellites?
Also, my Dashboard has no such notices.
I don’t wish to complain (though quite upset after all my work), but rather understand more clearly what paths (if any) are now open to me.
Nobody can tell if it is too late. If your node has already lost too much data then the probability for being disqualified is high. Also if you node keeps losing data.
You need to check with in your logs for audit errors. If you want advice from users here you should post the error message and tell more about your system and what problems you had before. This could be related to the data loss that seems to be happening on your node.
Which of the three logs would be most useful to post: stor, generate-identity or setup?
“storj” seems to have nothing out of the ordinary.
“generate-identity” has nothing logged.
“setup” has no logs since 8-31.
Shall I post the last? I’m new to all this, but eager to learn and work hard to do so.
Thank you.
I’m running this node on a TrueNAS system. I tried the commands you mention without success. The logs I mentioned are available in TrueNAS app. Otherwise, maybe I can access these logs from shell? I also have access to that. It seems that this TrueNAS build is not the best (ran into problems at setup too).
However, since you didn’t redirect logs to the file, they can be lost if the container was re-created.
You may also use the web UI to check logs, you need to check the storj container, but it’s not convenient to search for failed audit in web UI. You may also download logs and search locally in the log file.
I did have my node configured with a working email. I’ve not received any emails for being offline in the last month; and I’ve never received any suspension warnings. LKastly, I checked in on my node daily; it was always Status: Online QUIC OK (though yesterday, it did say it was only Online for 8 hours, which makes no sense, since I received no emails).
The log output from your commands is as follows:
2023-09-30T11:23:33Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “IMTCD6B4FKWQIIIUST3NNKSZYHOEDKD7I6T7LYRBADANHOTUQT4A”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET_AUDIT”, “Offset”: 8448, “Size”: 0, “Remote Address”: “34.146.139.227:58876”, “error”: “file does not exist”, “errorVerbose”: “file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:671\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2023-09-30T13:09:17Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “Y6CK5XLVCEAOUFXIYQFJBXYC7ZM5HF3HCLAB4LF64Q6MKWQTGORQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET_AUDIT”, “Offset”: 1354496, “Size”: 0, “Remote Address”: “34.148.62.121:55084”, “error”: “file does not exist”, “errorVerbose”: “file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:671\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2023-09-30T16:46:03Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “6QTWSBJFU42FKD4S52KXI5JG4P2M5SOD5YWXH7VBJXH7ICS6GCLA”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET_AUDIT”, “Offset”: 884224, “Size”: 0, “Remote Address”: “34.146.139.227:36776”, “error”: “file does not exist”, “errorVerbose”: “file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:671\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
The bold parts are the bad parts.
It means that the satellite has sent audit requests to your node to verify that your node has the data pieces that the satellite expects it to hold.
But your node responded that it does not have the requested files at all. That is the worst possible scenario.
So either they have been deleted or the node software does not have access to the correct location.
Depending on how much data you have already lost, this means that your node may get disqualified.
By the satellite id you can see that your node has lost data for different satellites.
You need to check what was/is the cause for the data loss. My suggestion first of all would be to take the node offline but keeping it running but cutting it off from the internet. Why? Because it can be offline for 30 days but if it remains online and keeps getting audited for lost pieces it might get disqualified from all satellites much faster and maybe even sooner than you have been able to diagnose and fix the underlying issue.
I would pick a piece id which has been logged as failed audit and search for it in the logs.
Then you can check if something had been logged about that file. Like what happened when it got uploaded and maybe some other additional log messages.
Second I would check in the storage folder if the piece is truly not there.
From the satellite id you can get the storage blobs folder: Satellite info (Address, ID, Blobs folder, Hex).
The first 2 characters of the piece id is the name of the folder. Check the storage and the trash folder if the piece is there.
If it has been successfully uploaded and it’s not there, then you have a problem. And if it is there, you have another problem.
Maybe it is a file system problem and you need to run a file system check and repair.
OK, excellent. I will look into these matters. Quick follow-up: is it sufficient to remove port forwarding? The dashboard still shows the node online, though QUIC as misconfigured.
So searching the logs for other errors with the problematic piece IDs, I find the following (as an example). Is it of any use? Not able to make any sense of it, beyond it confirming the piece does not exist.
2023-09-30T16:46:03Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “6QTWSBJFU42FKD4S52KXI5JG4P2M5SOD5YWXH7VBJXH7ICS6GCLA”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET_AUDIT”, “Offset”: 884224, “Size”: 0, “Remote Address”: “34.146.139.227:36776”, “error”: “file does not exist”, “errorVerbose”: “file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:671\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
That is just the message about the failed audit again.
Maybe the upload messages concerning these specific pieces are no longer present in the log as it gets deleted normally when you remove the container.
But as said, additionally you have to check in the storage and trash folders if the piece is there or not.
Would it be reasonable to assume that, at this point, knowing Docker is necessary to any possible troubleshooting here? I had planned to slowly learn Docker over the next year, but maybe I jumped the gun (e.g., I can’t even find the trash!). I discovered the Storj app in TrueNAS charts, loved the project, so I gave it a shot. But I just started studying Linux less than a year ago for fun.
I’d be open to a paid tutor, if anyone’s interested
Unfortunately I am not on TrueNAS so I am not familiar with how the node setup looks on such a device or where to look for the underlying folders.
But I am sure others will be able to tell you exactly where you need to look for them.
I understand. It’s a been a bit of a pickle: those in the TrueNAS Discord room, don’t know Storj, and those here don’t know TrueNAS well, though Alexey seems generally familiar with it.