Node is offline and showing error in logs

Hi.
My node is offline.I ran the log command and received this:
$ sudo docker logs --tail 30 storagenode

5443927s"}
2020-05-15T13:29:57.228Z ERROR piecestore:cache error getting current space used calculation: {“error”: “lstat config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/7w/yhbpbjjinob74p3wf7utjp2nzacq6clcjlhn5hhbakeiof2ceq.sj1: bad message; lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/yk/2a2cmvnuudqazimxs3jnxaw2y7ton4q5ggdluzmgblggjekg2a.sj1: bad message”, “errorVerbose”: “group:\n— lstat config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/7w/yhbpbjjinob74p3wf7utjp2nzacq6clcjlhn5hhbakeiof2ceq.sj1: bad message\n— lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/yk/2a2cmvnuudqazimxs3jnxaw2y7ton4q5ggdluzmgblggjekg2a.sj1: bad message”}
2020-05-15T13:29:57.229Z ERROR contact:service ping satellite failed {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “attempts”: 1, “error”: “ping satellite error: context canceled”, “errorVerbose”: “ping satellite error: context canceled\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:130\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:87\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
2020-05-15T13:29:57.229Z ERROR contact:service ping satellite failed {“Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “attempts”: 1, “error”: “ping satellite error: context canceled”, “errorVerbose”: “ping satellite error: context canceled\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:130\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:87\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
2020-05-15T13:29:57.230Z INFO contact:service context cancelled {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”}
2020-05-15T13:29:57.230Z INFO contact:service context cancelled {“Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
2020-05-15T13:29:57.231Z ERROR nodestats:cache Get pricing-model/join date failed {“error”: “context canceled”}
Error: lstat config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/7w/yhbpbjjinob74p3wf7utjp2nzacq6clcjlhn5hhbakeiof2ceq.sj1: bad message; lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/yk/2a2cmvnuudqazimxs3jnxaw2y7ton4q5ggdluzmgblggjekg2a.sj1: bad message
2020-05-15T13:30:59.670Z INFO Configuration loaded {“Location”: “/app/config/config.yaml”}
2020-05-15T13:30:59.671Z INFO tracing disabled
2020-05-15T13:30:59.673Z INFO Operator email {“Address”: “ilan.klein1@gmail.com”}
2020-05-15T13:30:59.673Z INFO Operator wallet {“Address”: “0x99969Df2dA9BF780Cfd62D7cC22f77E5BdB332df”}
2020-05-15T13:31:00.009Z INFO db.migration Database Version {“version”: 36}
2020-05-15T13:31:00.735Z INFO preflight:localtime start checking local system clock with trusted satellites’ system clock.
2020-05-15T13:31:01.380Z INFO preflight:localtime local system clock is in sync with trusted satellites’ system clock.
2020-05-15T13:31:01.380Z INFO bandwidth Performing bandwidth usage rollups
2020-05-15T13:31:01.382Z INFO Node 1FAXWhjtehvS8Dyj7avb5tvqrCderxtKE3XZZJ2vvYnTnfoQYt started
2020-05-15T13:31:01.382Z INFO Public server started on [::]:28967
2020-05-15T13:31:01.382Z INFO Private server started on 127.0.0.1:7778
2020-05-15T13:31:01.384Z INFO trust Scheduling next refresh {“after”: “6h37m8.303837355s”}
2020-05-15T13:31:02.349Z ERROR piecestore:cache error getting current space used calculation: {“error”: “lstat config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/7w/yhbpbjjinob74p3wf7utjp2nzacq6clcjlhn5hhbakeiof2ceq.sj1: bad message; lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/yk/2a2cmvnuudqazimxs3jnxaw2y7ton4q5ggdluzmgblggjekg2a.sj1: bad message”, “errorVerbose”: “group:\n— lstat config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/7w/yhbpbjjinob74p3wf7utjp2nzacq6clcjlhn5hhbakeiof2ceq.sj1: bad message\n— lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/yk/2a2cmvnuudqazimxs3jnxaw2y7ton4q5ggdluzmgblggjekg2a.sj1: bad message”}
2020-05-15T13:31:02.350Z ERROR nodestats:cache Get pricing-model/join date failed {“error”: “context canceled”}
2020-05-15T13:31:02.350Z ERROR contact:service ping satellite failed {“Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “attempts”: 1, “error”: “ping satellite error: context canceled”, “errorVerbose”: “ping satellite error: context canceled\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:130\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:87\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
2020-05-15T13:31:02.350Z INFO contact:service context cancelled {“Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}
2020-05-15T13:31:02.350Z ERROR contact:service ping satellite failed {“Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “attempts”: 1, “error”: “ping satellite error: context canceled”, “errorVerbose”: “ping satellite error: context canceled\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:130\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:87\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
2020-05-15T13:31:02.351Z INFO contact:service context cancelled {“Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
Error: lstat config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/7w/yhbpbjjinob74p3wf7utjp2nzacq6clcjlhn5hhbakeiof2ceq.sj1: bad message; lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/yk/2a2cmvnuudqazimxs3jnxaw2y7ton4q5ggdluzmgblggjekg2a.sj1: bad message

Any idea what is going on?
Thanks
Ilan

Can you show your docker run command ? Remove any personal info from it.

Yes.
sudo docker run -d --restart unless-stopped -p 34567:28967
-e WALLET=“XXXXXXXXXX”
-e EMAIL="email@gmail.com"
-e ADDRESS=“xxxx.hopto.org:34567
-e BANDWIDTH=“61.5TB”
-e STORAGE=“7TB”
–mount type=bind,source="/home/pi/SeagateCopy/storj_identity/identit
y/storagenode",destination=/app/identity
–mount type=bind,source="/media/Seagate_8T/Node_1",destination=/app/
config
–name storagenode storjlabs/storagenode:beta

Thanks
Ilan

This is the error I get when I try to look at the dashboard:
Error response from daemon: Container 5159439a44e58a5988a2c2abafa9bcfd5c9590a436733c5a3f301ae932d8b4dc is restarting, wait until the container is running

Ilan

Can you check if your identity folder has 6 files in it ?

Yes.
ca.1572022556.cert ca.key identity.cert
ca.cert identity.1572022556.cert identity.key

Ilan

How is your HDD connected ?

Hi.
It is connected through a USB 3 port.
Thanks
Ilan

@Alexey Your insight please.

" Linux Users: You must static mount via /etc/fstab. Failure to do so will put you in high risk of failing audits and getting disqualified."

https://documentation.storj.io/setup/cli/storage-node

2 Likes

Firstly you need to fix a connection problem, your node can’t reach the satellites.
Second you need to check the connected drive, I have a feeling that it’s a NTFS system on the Linux host.
If so, you should stop and remove the storagenode container, disconnect the drive, connect it to the Windows system and check it for errors, then safely eject and connect back to the Linux host.
I would like to suggest to migrate from the NTFS to the ext4 as soon as possible. For that you need to copy over all the data to somewhere, format the drive to the ext4 and return back all the data.
If you would lose the data, your node will be disqualified, so, be careful.
We can discuss the variants how to migrate in a separate thread, if needed.

And of course, you must make a static mount, as @peem mentioned

Hi, @Alexey.
The node is connected. I can reach it.
The drive is not NTFS, it is formatted for Linux. This is what I get when I run the df-hT command:
df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/root ext4 15G 3.6G 11G 27% /
devtmpfs devtmpfs 1.8G 0 1.8G 0% /dev
tmpfs tmpfs 2.0G 84K 2.0G 1% /dev/shm
tmpfs tmpfs 2.0G 8.5M 1.9G 1% /run
tmpfs tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/mmcblk0p1 vfat 253M 54M 199M 22% /boot
/dev/sda1 ext4 7.3T 93M 6.9T 1% /media/Seagate_8T
tmpfs tmpfs 391M 0 391M 0% /run/user/1000

It is mounted as an fstab mount.
I had the same issue with the same machine some time before and you advised then to reformat the hard drive from NTFS to Linux native. I did that and remounted the hard drive. Deleted the old data and started a new node with a new identity.
Now, this node behaves the same way. I have three other nodes that are working without a hitch. They are all on the same network and connected to a UPS. So these was no sudden shutdown or anything like that.
I would like to solve this before I try another identity.
Thanks
Ilan

3 things stand out:

  • satellite ping issue
  • 2x data blobs being unreadable
  • container restarting (prob. due to the 2 previous things)

I would first try to solve the satellite ping issue. you’re using some dynamic dns config for what i imagine is a floating WAN IP address. depending on what the ping check expects, e.g. a response in the node from the satellite, then perhaps your ddns config is breaking your natting/forwarding of the node’s port publicly preventing satellite ping responses from hitting the node.

i don’t know what the severity of the unreadable blobls is, but i’d try stat-ing the mentioned blobs manually and also running file against them. perhaps someone who has been reading the forums more can say what are usual causes/fixes for the “{blob name} bad message; lstat” exceptions.

Get pricing-model/join date failed {“error”: “context canceled”}: is a non-issue from what i remember for as long sa your ident. files are correct, which youve verified.

keep checking docker ps and the container log file to see if the container has stopped auto rebooting, so that you know youve cleared all the service’s startup errors.

goto http://nmap.online-domain-tools.com and use custom scan with “-v -Pn -p 28967 DNS/IP” to check pub.acc. to p28967

also, in answering the query abot the FS type of the drive you provided the output of df -h. can you please provide the output of mount | grep "SeagateCopy|Seagate_8T" so that we can actually see the FS type of both of those drives.