I’m running this node on a TrueNAS system. I tried the commands you mention without success. The logs I mentioned are available in TrueNAS app. Otherwise, maybe I can access these logs from shell? I also have access to that. It seems that this TrueNAS build is not the best (ran into problems at setup too).
You can access logs from the shell:
- Open a shell from System settings
- List containers
sudo docker ps
- Search for container, contained
storj
in its name, copy its container ID - Filter logs
sudo docker logs c55ff406647a 2>&1 | grep GET_AUDIT | grep failed
However, since you didn’t redirect logs to the file, they can be lost if the container was re-created.
You may also use the web UI to check logs, you need to check the storj
container, but it’s not convenient to search for failed audit in web UI. You may also download logs and search locally in the log file.
Alexey:
Thank you.
I did have my node configured with a working email. I’ve not received any emails for being offline in the last month; and I’ve never received any suspension warnings. LKastly, I checked in on my node daily; it was always Status: Online QUIC OK (though yesterday, it did say it was only Online for 8 hours, which makes no sense, since I received no emails).
The log output from your commands is as follows:
2023-09-30T11:23:33Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “IMTCD6B4FKWQIIIUST3NNKSZYHOEDKD7I6T7LYRBADANHOTUQT4A”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET_AUDIT”, “Offset”: 8448, “Size”: 0, “Remote Address”: “34.146.139.227:58876”, “error”: “file does not exist”, “errorVerbose”: “file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:671\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2023-09-30T13:09:17Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “Y6CK5XLVCEAOUFXIYQFJBXYC7ZM5HF3HCLAB4LF64Q6MKWQTGORQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET_AUDIT”, “Offset”: 1354496, “Size”: 0, “Remote Address”: “34.148.62.121:55084”, “error”: “file does not exist”, “errorVerbose”: “file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:671\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2023-09-30T16:46:03Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “6QTWSBJFU42FKD4S52KXI5JG4P2M5SOD5YWXH7VBJXH7ICS6GCLA”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET_AUDIT”, “Offset”: 884224, “Size”: 0, “Remote Address”: “34.146.139.227:36776”, “error”: “file does not exist”, “errorVerbose”: “file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:671\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
The bold parts are the bad parts.
It means that the satellite has sent audit requests to your node to verify that your node has the data pieces that the satellite expects it to hold.
But your node responded that it does not have the requested files at all. That is the worst possible scenario.
So either they have been deleted or the node software does not have access to the correct location.
Depending on how much data you have already lost, this means that your node may get disqualified.
By the satellite id you can see that your node has lost data for different satellites.
You need to check what was/is the cause for the data loss. My suggestion first of all would be to take the node offline but keeping it running but cutting it off from the internet. Why? Because it can be offline for 30 days but if it remains online and keeps getting audited for lost pieces it might get disqualified from all satellites much faster and maybe even sooner than you have been able to diagnose and fix the underlying issue.
OK, thank you, good suggestion taking node offline. It’s really stressful when the proverbial clock is ticking. I simply removed the port forwarding.
So I have been reading about various reasons for this situation, and they seem abundant. And yet, none that I have read so far seem applicable/likely.
The drives are all healthy, according to TrueNAS. And they are top quality, nearly new HDDs (about four months old).
I have made no changes that could either screw up permissions or change dir locations, since initial setup.
I only have this one node, so it isn’t two nodes with same identity.
If relevant, I have this node setup with ZFS RAIDZ1 configuration.
Next steps?
I would pick a piece id which has been logged as failed audit and search for it in the logs.
Then you can check if something had been logged about that file. Like what happened when it got uploaded and maybe some other additional log messages.
Second I would check in the storage folder if the piece is truly not there.
From the satellite id you can get the storage blobs folder: Satellite info (Address, ID, Blobs folder, Hex).
The first 2 characters of the piece id is the name of the folder. Check the storage and the trash folder if the piece is there.
If it has been successfully uploaded and it’s not there, then you have a problem. And if it is there, you have another problem.
Maybe it is a file system problem and you need to run a file system check and repair.
OK, excellent. I will look into these matters. Quick follow-up: is it sufficient to remove port forwarding? The dashboard still shows the node online, though QUIC as misconfigured.
So searching the logs for other errors with the problematic piece IDs, I find the following (as an example). Is it of any use? Not able to make any sense of it, beyond it confirming the piece does not exist.
2023-09-30T16:46:03Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “6QTWSBJFU42FKD4S52KXI5JG4P2M5SOD5YWXH7VBJXH7ICS6GCLA”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET_AUDIT”, “Offset”: 884224, “Size”: 0, “Remote Address”: “34.146.139.227:36776”, “error”: “file does not exist”, “errorVerbose”: “file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:671\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
That is just the message about the failed audit again.
Maybe the upload messages concerning these specific pieces are no longer present in the log as it gets deleted normally when you remove the container.
But as said, additionally you have to check in the storage and trash folders if the piece is there or not.
Would it be reasonable to assume that, at this point, knowing Docker is necessary to any possible troubleshooting here? I had planned to slowly learn Docker over the next year, but maybe I jumped the gun (e.g., I can’t even find the trash!). I discovered the Storj app in TrueNAS charts, loved the project, so I gave it a shot. But I just started studying Linux less than a year ago for fun.
I’d be open to a paid tutor, if anyone’s interested
I searched the docs for how to find the trash, but nothing (I’m guessing that is just obvious to most everyone).
Well, if people are willing to help with my oh, so basic questions: how do I find the trash in the storj docker container?
Unfortunately I am not on TrueNAS so I am not familiar with how the node setup looks on such a device or where to look for the underlying folders.
But I am sure others will be able to tell you exactly where you need to look for them.
I understand. It’s a been a bit of a pickle: those in the TrueNAS Discord room, don’t know Storj, and those here don’t know TrueNAS well, though Alexey seems generally familiar with it.
Generally there are many knowledgeable users on here. I am sure somebody will be able to tell you exactly where to look on a TrueNAS.
Its the weekend so you will have to wait till others see your post
Thank you all so much for the help and encouragement!
I’ve fully shut my node down to be sure I’m not online. That gives me 30 days to work on this without stressing.
Hey, @Alexey: able to give me a hand with this? Know TrueNAs well enough to problem solve it with me?
The node’s data folders are located in the dataset, which your TrueNAS used to store the application’s data. It’s not hardcoded, so you need to search it.
You may execute the command in the shell:
df -T --si
It should show mount points and you can locate, where your dataset is mounted. Next you need to navigate to the folder with node’s data.
For example, for the dataset with name data
, the output may look like:
Filesystem Type Size Used Avail Use% Mounted on
udev devtmpfs 819M 0 819M 0% /dev
tmpfs tmpfs 201M 8.1M 193M 5% /run
boot-pool/ROOT/22.12.2 zfs 115G 2.9G 112G 3% /
tmpfs tmpfs 1.1G 107k 1.1G 1% /dev/shm
tmpfs tmpfs 105M 0 105M 0% /run/lock
tmpfs tmpfs 1.1G 13k 1.1G 1% /tmp
boot-pool/grub zfs 112G 8.7M 112G 1% /boot/grub
data zfs 127G 132k 127G 1% /mnt/data
data/ix-applications zfs 127G 132k 127G 1% /mnt/data/ix-applications
data/ix-applications/k3s zfs 127G 141M 127G 1% /mnt/data/ix-applications/k3s
data/ix-applications/docker zfs 129G 2.2G 127G 2% /mnt/data/ix-applications/docker
data/ix-applications/catalogs zfs 127G 59M 127G 1% /mnt/data/ix-applications/catalogs
data/ix-applications/releases zfs 127G 132k 127G 1% /mnt/data/ix-applications/releases
data/ix-applications/default_volumes zfs 127G 132k 127G 1% /mnt/data/ix-applications/default_volumes
data/ix-applications/releases/storj zfs 127G 132k 127G 1% /mnt/data/ix-applications/releases/storj
data/ix-applications/releases/storj/volumes zfs 127G 132k 127G 1% /mnt/data/ix-applications/releases/storj/volumes
data/ix-applications/releases/storj/charts zfs 127G 394k 127G 1% /mnt/data/ix-applications/releases/storj/charts
data/ix-applications/releases/storj/volumes/ix_volumes zfs 127G 132k 127G 1% /mnt/data/ix-applications/releases/storj/volumes/ix_volumes
data/ix-applications/releases/storj/volumes/ix_volumes/ix_data zfs 127G 263k 127G 1% /mnt/data/ix-applications/releases/storj/volumes/ix_volumes/ix_data
data/ix-applications/releases/storj/volumes/ix_volumes/ix_identity zfs 127G 263k 127G 1% /mnt/data/ix-applications/releases/storj/volumes/ix_volumes/ix_identity
...
So, your data for that example should be in /mnt/data/ix-applications/releases/storj/volumes/ix_volumes
:
[-]$ ls -l /mnt/data/ix-applications/releases/storj/volumes/ix_volumes
total 17
drwxr-xr-x 4 apps apps 7 Sep 30 18:28 ix_data
drwxr-xr-x 2 apps apps 8 May 1 01:48 ix_identity
ix_data
should contain all folders, include blobs
and trash
.