Node is offline and showing error in logs

kofiko · May 15, 2020, 1:36pm

Hi.
My node is offline.I ran the log command and received this:
$ sudo docker logs --tail 30 storagenode

5443927s"}
2020-05-15T13:29:57.228Z 2020-05-15T13:29:57.229Z 2020-05-15T13:29:57.229Z 2020-05-15T13:29:57.230Z 2020-05-15T13:29:57.230Z 2020-05-15T13:29:57.231Z Error: lstat config/stor 2020-05-15T13:30:59.670Z 2020-05-15T13:30:59.671Z 2020-05-15T13:30:59.673Z 2020-05-15T13:30:59.673Z 2020-05-15T13:31:00.009Z 2020-05-15T13:31:00.735Z 2020-05-15T13:31:01.380Z 2020-05-15T13:31:01.380Z 2020-05-15T13:31:01.382Z 2020-05-15T13:31:01.382Z 2020-05-15T13:31:01.382Z 2020-05-15T13:31:01.384Z 2020-05-15T13:31:02.349Z 2020-05-15T13:31:02.350Z 2020-05-15T13:31:02.350Z 2020-05-15T13:31:02.350Z 2020-05-15T13:31:02.350Z 2020-05-15T13:31:02.351Z Error: lstat config/stor ERROR piecestore:cache error getting current space used calculation: {“error”: “lstat config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/7w/yhbpbjjinob74p3wf7utjp2nzacq6clcjlhn5hhbakeiof2ceq.sj1: bad message; lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/yk/2a2cmvnuudqazimxs3jnxaw2y7ton4q5ggdluzmgblggjekg2a.sj1: bad message”, “errorVerbose”: “group:\n— lstat config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/7w/yhbpbjjinob74p3wf7utjp2nzacq6clcjlhn5hhbakeiof2ceq.sj1: bad message\n— lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/yk/2a2cmvnuudqazimxs3jnxaw2y7ton4q5ggdluzmgblggjekg2a.sj1: bad message”}
ERROR contact:service ping satellite failed {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “attempts”: 1, “error”: “ping satellite error: context canceled”, “errorVerbose”: “ping satellite error: context canceled\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:130\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:87\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
ERROR contact:service ping satellite failed {“Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “attempts”: 1, “error”: “ping satellite error: context canceled”, “errorVerbose”: “ping satellite error: context canceled\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:130\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:87\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
INFO contact:service context cancelled {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”}
INFO contact:service context cancelled {“Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
ERROR nodestats:cache Get pricing-model/join date failed {“error”: “context canceled”}
age/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/7w/yhbpbjjinob74p3wf7utjp2nzacq6clcjlhn5hhbakeiof2ceq.sj1: bad message; lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/yk/2a2cmvnuudqazimxs3jnxaw2y7ton4q5ggdluzmgblggjekg2a.sj1: bad message
INFO Configuration loaded {“Location”: “/app/config/config.yaml”}
INFO tracing disabled
INFO Operator email {“Address”: “ilan.klein1@gmail.com”}
INFO Operator wallet {“Address”: “0x99969Df2dA9BF780Cfd62D7cC22f77E5BdB332df”}
INFO db.migration Database Version {“version”: 36}
INFO preflight:localtime start checking local system clock with trusted satellites’ system clock.
INFO preflight:localtime local system clock is in sync with trusted satellites’ system clock.
INFO bandwidth Performing bandwidth usage rollups
INFO Node 1FAXWhjtehvS8Dyj7avb5tvqrCderxtKE3XZZJ2vvYnTnfoQYt started
INFO Public server started on [::]:28967
INFO Private server started on 127.0.0.1:7778
INFO trust Scheduling next refresh {“after”: “6h37m8.303837355s”}
ERROR piecestore:cache error getting current space used calculation: {“error”: “lstat config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/7w/yhbpbjjinob74p3wf7utjp2nzacq6clcjlhn5hhbakeiof2ceq.sj1: bad message; lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/yk/2a2cmvnuudqazimxs3jnxaw2y7ton4q5ggdluzmgblggjekg2a.sj1: bad message”, “errorVerbose”: “group:\n— lstat config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/7w/yhbpbjjinob74p3wf7utjp2nzacq6clcjlhn5hhbakeiof2ceq.sj1: bad message\n— lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/yk/2a2cmvnuudqazimxs3jnxaw2y7ton4q5ggdluzmgblggjekg2a.sj1: bad message”}
ERROR nodestats:cache Get pricing-model/join date failed {“error”: “context canceled”}
ERROR contact:service ping satellite failed {“Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “attempts”: 1, “error”: “ping satellite error: context canceled”, “errorVerbose”: “ping satellite error: context canceled\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:130\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:87\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
INFO contact:service context cancelled {“Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}
ERROR contact:service ping satellite failed {“Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “attempts”: 1, “error”: “ping satellite error: context canceled”, “errorVerbose”: “ping satellite error: context canceled\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:130\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:87\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
INFO contact:service context cancelled {“Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
age/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/7w/yhbpbjjinob74p3wf7utjp2nzacq6clcjlhn5hhbakeiof2ceq.sj1: bad message; lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/yk/2a2cmvnuudqazimxs3jnxaw2y7ton4q5ggdluzmgblggjekg2a.sj1: bad message

Any idea what is going on?
Thanks
Ilan

nerdatwork · May 15, 2020, 2:47pm

Can you show your docker run command ? Remove any personal info from it.

kofiko · May 15, 2020, 3:01pm

Yes.
sudo docker run -d --restart unless-stopped -p 34567:28967
-e WALLET=“XXXXXXXXXX”
-e EMAIL="email@gmail.com"
-e ADDRESS=“xxxx.hopto.org:34567”
-e BANDWIDTH=“61.5TB”
-e STORAGE=“7TB”
–mount type=bind,source="/home/pi/SeagateCopy/storj_identity/identit
y/storagenode",destination=/app/identity
–mount type=bind,source="/media/Seagate_8T/Node_1",destination=/app/
config
–name storagenode storjlabs/storagenode:beta

Thanks
Ilan

kofiko · May 15, 2020, 8:24pm

This is the error I get when I try to look at the dashboard:
Error response from daemon: Container 5159439a44e58a5988a2c2abafa9bcfd5c9590a436733c5a3f301ae932d8b4dc is restarting, wait until the container is running

Ilan

nerdatwork · May 16, 2020, 2:04am

Can you check if your identity folder has 6 files in it ?

kofiko · May 16, 2020, 4:48am

Yes.
ca.1572022556.cert ca.key identity.cert
ca.cert identity.1572022556.cert identity.key

Ilan

nerdatwork · May 16, 2020, 4:50am

How is your HDD connected ?

kofiko · May 17, 2020, 2:26pm

Hi.
It is connected through a USB 3 port.
Thanks
Ilan

nerdatwork · May 17, 2020, 2:55pm

@Alexey Your insight please.

peem · May 17, 2020, 5:58pm

" Linux Users: You must static mount via /etc/fstab. Failure to do so will put you in high risk of failing audits and getting disqualified."

https://documentation.storj.io/setup/cli/storage-node

Alexey · May 17, 2020, 8:53pm

Firstly you need to fix a connection problem, your node can’t reach the satellites.
Second you need to check the connected drive, I have a feeling that it’s a NTFS system on the Linux host.
If so, you should stop and remove the storagenode container, disconnect the drive, connect it to the Windows system and check it for errors, then safely eject and connect back to the Linux host.
I would like to suggest to migrate from the NTFS to the ext4 as soon as possible. For that you need to copy over all the data to somewhere, format the drive to the ext4 and return back all the data.
If you would lose the data, your node will be disqualified, so, be careful.
We can discuss the variants how to migrate in a separate thread, if needed.

And of course, you must make a static mount, as @peem mentioned

kofiko · May 18, 2020, 3:04pm

Hi, @Alexey.
The node is connected. I can reach it.
The drive is not NTFS, it is formatted for Linux. This is what I get when I run the df-hT command:
df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/root ext4 15G 3.6G 11G 27% /
devtmpfs devtmpfs 1.8G 0 1.8G 0% /dev
tmpfs tmpfs 2.0G 84K 2.0G 1% /dev/shm
tmpfs tmpfs 2.0G 8.5M 1.9G 1% /run
tmpfs tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/mmcblk0p1 vfat 253M 54M 199M 22% /boot
/dev/sda1 ext4 7.3T 93M 6.9T 1% /media/Seagate_8T
tmpfs tmpfs 391M 0 391M 0% /run/user/1000

It is mounted as an fstab mount.
I had the same issue with the same machine some time before and you advised then to reformat the hard drive from NTFS to Linux native. I did that and remounted the hard drive. Deleted the old data and started a new node with a new identity.
Now, this node behaves the same way. I have three other nodes that are working without a hitch. They are all on the same network and connected to a UPS. So these was no sudden shutdown or anything like that.
I would like to solve this before I try another identity.
Thanks
Ilan

tyakimov · May 18, 2020, 4:31pm

3 things stand out:

satellite ping issue
2x data blobs being unreadable
container restarting (prob. due to the 2 previous things)

I would first try to solve the satellite ping issue. you’re using some dynamic dns config for what i imagine is a floating WAN IP address. depending on what the ping check expects, e.g. a response in the node from the satellite, then perhaps your ddns config is breaking your natting/forwarding of the node’s port publicly preventing satellite ping responses from hitting the node.

i don’t know what the severity of the unreadable blobls is, but i’d try stat-ing the mentioned blobs manually and also running file against them. perhaps someone who has been reading the forums more can say what are usual causes/fixes for the “{blob name} bad message; lstat” exceptions.

Get pricing-model/join date failed {“error”: “context canceled”}: is a non-issue from what i remember for as long sa your ident. files are correct, which youve verified.

keep checking docker ps and the container log file to see if the container has stopped auto rebooting, so that you know youve cleared all the service’s startup errors.

tyakimov · May 18, 2020, 4:33pm

goto http://nmap.online-domain-tools.com and use custom scan with “-v -Pn -p 28967 DNS/IP” to check pub.acc. to p28967

tyakimov · May 18, 2020, 4:49pm

also, in answering the query abot the FS type of the drive you provided the output of df -h. can you please provide the output of mount | grep "SeagateCopy|Seagate_8T" so that we can actually see the FS type of both of those drives.