Contact: service ping satellite failed

dragonhogan · March 20, 2020, 10:49pm

Just saw this error for the first time “contact: service ping satellite failed,” any thoughts?

2020-03-20T22:43:03.869Z	ERROR	contact:service	ping satellite failed	{“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “attempts”: 1, “error”: “ping satellite error: rpccompat: dial tcp: lookup us-central-1.tardigrade.io on 192.168.1.1:53: read udp 172.17.0.3:46061->192.168.1.1:53: i/o timeout”, “errorVerbose”: “ping satellite error: rpccompat: dial tcp: lookup us-central-1.tardigrade.io on 192.168.1.1:53: read udp 172.17.0.3:46061->192.168.1.1:53: i/o timeout\n\tstorj.io/common/rpc.Dialer.dialTransport:256\n\tstorj.io/common/rpc.Dialer.dial:233\n\tstorj.io/common/rpc.Dialer.DialAddressID:152\n\tstorj.io/storj/storagenode/contact.(Service).pingSatelliteOnce:117\n\tstorj.io/storj/storagenode/contact.(Service).pingSatellite:87\n\tstorj.io/storj/storagenode/contact.(Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(Cycle).Run:147\n\tstorj.io/common/sync2.(Cycle).Start.func1:68\n\tgolang.org/x/sync/errgroup.(Group).Go.func1:57”}
2020-03-20T22:43:03.869Z	ERROR	contact:service	ping satellite failed	{“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “attempts”: 1, “error”: “ping satellite error: rpccompat: dial tcp: lookup satellite.stefan-benten.de on 192.168.1.1:53: read udp 172.17.0.3:48269->192.168.1.1:53: i/o timeout”, “errorVerbose”: “ping satellite error: rpccompat: dial tcp: lookup satellite.stefan-benten.de on 192.168.1.1:53: read udp 172.17.0.3:48269->192.168.1.1:53: i/o timeout\n\tstorj.io/common/rpc.Dialer.dialTransport:256\n\tstorj.io/common/rpc.Dialer.dial:233\n\tstorj.io/common/rpc.Dialer.DialAddressID:152\n\tstorj.io/storj/storagenode/contact.(Service).pingSatelliteOnce:117\n\tstorj.io/storj/storagenode/contact.(Service).pingSatellite:87\n\tstorj.io/storj/storagenode/contact.(Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(Cycle).Run:147\n\tstorj.io/common/sync2.(Cycle).Start.func1:68\n\tgolang.org/x/sync/errgroup.(Group).Go.func1:57”}

vedalken254 · March 21, 2020, 12:57am

After reading the exact error, it looks like your node is failing to get a proper reply via DNS for the us-central-1 satellite and Stefan’s testing satellite. What OS are you running and do you have DNS manually configured or is it just pulled from your router via DHCP?

Veddy

PS: @Alexey could you please split this discussion off of the sticky post please? Thanks!

dragonhogan · March 21, 2020, 1:13am

Rasbian and have DNS setup through No IP and DUC running. Node has been running for almost 8 months and dont recall ever seeing the error…

vedalken254 · March 21, 2020, 1:26am

I meant local DNS. Dynamic DNS is great, but this is a failure on your node to resolve the IP addresses associated with satellite hostnames. What DNS servers do you have that particular RPi pointed at? From what it looks like to me, they’re pointed to your router at 192.168.1.1 rather than something like GoogleDNS at 8.8.8.8 and 8.8.4.4. You can try running dig us-central-1.tardigrade.io on your RPi and see if it was just a temporary failure to resolve or if it’s still acting up.

Veddy

dragonhogan · March 21, 2020, 2:38am

Appreciate the assistance on this, although to be honest, I don’t know what you mean by “dig”…could you explain in layman’s terms?

deathlessdd · March 21, 2020, 2:44am

Change your dns to a manually set dns on your router itself so it doesnt use your router as the dns server.

dragonhogan · March 21, 2020, 2:52am

Would a DHCP reservation work just the same for the IP address?

deathlessdd · March 21, 2020, 2:54am

No, Your router is failing to resolve an ip address so its failing to be able to ping it. There should be a dns setting on your router. Or you can change it manually on the rpi itself in the ethernet settings

dragonhogan · March 21, 2020, 2:55am

Okay, I’ll take a look at it then. I just screened through my logs and haven’t seen the error again. The time that it happened earlier today was right after a bandwidth rollup, although I don’t know enough to know if that was related or not.

deathlessdd · March 21, 2020, 2:56am

It could be possible that your router can’t keep up with the dns requests.

TheMightyGreek · June 21, 2020, 9:17am

Hi,
just wondering if you had an update, I just ran into the same problem with my node (contact:service ping satellite failed) just after a bandwidth rollup. I restarted the node and now it always gives me that error just after startup as well as preflight:localtime unable to get satellite system time
I just setup a noip dns and it’s been running for about an hour. I restarted my router just to double check and it still gives me the same error. It looks like it can’t ping 3 servers (us central, europe west and asia east) but still manages to get traffic from the remaining 3.
Here is the log from when I restarted the node, it keeps trying to ping the servers

I’m running my storagenode on Ubuntu with docker, It’s about 3 weeks old and has been running perfectly until today.

Cheers,
Gab

peem · June 21, 2020, 9:35am

the satellites are currently under maintenance.
https://status.tardigrade.io/incidents/slz29qswyw1t

TheMightyGreek · June 21, 2020, 9:38am

Wow thanks a lot @peem that explains it.
I guess I took this storagenode thing a bit too seriously and started freaking out when I saw all these errors hahaha

ElectronHarvester · December 4, 2020, 4:33am

I’ve tried digging through the forums to answer my issue and this is seems like he best place to post. I’m trying to start a new node on an existing machine and am getting some errors and the node is offline.

What I’ve done:

Created a new ID
Signed the id with a new auth code
confirmed that the ID was signed using the grep commands
updated port forwarding and firewall
confirmed such using https://www.yougetsignal.com/tools/open-ports/
ping google.com runs from the machine with no lost packets

created new node on docker using:

  sudo docker run -d --restart unless-stopped --privileged \
  --stop-timeout 300 \
  -p 28966:28967 \
  -p 14001:14002 \
  -e WALLET=<<omitted>> \
  -e EMAIL="<<omitted>>" \
  -e ADDRESS="<<omitted>>:28966" \
  -e STORAGE="500GB" \
  --mount type=bind,source="/mnt/hdd3/identity",destination=/app/identity \
  --mount type=bind,source="/mnt/hdd3/storage",destination=/app/config \
  --name storj-node-03 storjlabs/storagenode:latest

the UI works but is offline. When I check the logs I see:

2020-12-04T03:49:07.586Z ERROR contact:service ping satellite failed {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “attempts”: 1, “error”: “ping satellite error: failed to dial storage node (ID: <>) at address electronharvester.ddns.net:28966: rpc: tls peer certificate verification error: tlsopts error: peer ID did not match requested ID”, “errorVerbose”: “ping satellite error: failed to dial storage node (ID:<> ) at address <>.ddns.net:28966: rpc: tls peer certificate verification error: tlsopts error: peer ID did not match requested ID\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:141\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
2020-12-04T03:49:07.835Z

This feels like a bad ID but I’ve created and signed two ids and keep getting the same issues. What are the next steps to resolve this? For the life of me I can’t see to figure out what is wrong. My other node is unaffected by these issues.

deathlessdd · December 4, 2020, 12:09pm

How many nodes are you running on this pc?

How many identity’s do you have currently?

Did you create a new identity and sign the new identity?

I’ve seen a few times where someone would try to resign the old identity’s instead of the new identity but put the files for the new unsigned identity into the node.

ElectronHarvester · December 4, 2020, 8:45pm

I redid everything and the third time was the charm. Idk if I was in a rush before or what. I’m going to make another identity and try ports 28966 and 14001 again I don’t know what else I changed to get it to work.

UPDATE: I figured it out. My router (AT&T’s Arris BGW210-700) has both a port range and base port for the firewall config. I expanded the port range to allow port forwarding on 28966-28968 but left the base port at 28967. Well apparently that makes it overlook 28966.