Contact: service ping satellite failed

Just saw this error for the first time “contact: service ping satellite failed,” any thoughts?

2020-03-20T22:43:03.869Z ERROR contact:service ping satellite failed {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “attempts”: 1, “error”: “ping satellite error: rpccompat: dial tcp: lookup us-central-1.tardigrade.io on 192.168.1.1:53: read udp 172.17.0.3:46061->192.168.1.1:53: i/o timeout”, “errorVerbose”: “ping satellite error: rpccompat: dial tcp: lookup us-central-1.tardigrade.io on 192.168.1.1:53: read udp 172.17.0.3:46061->192.168.1.1:53: i/o timeout\n\tstorj.io/common/rpc.Dialer.dialTransport:256\n\tstorj.io/common/rpc.Dialer.dial:233\n\tstorj.io/common/rpc.Dialer.DialAddressID:152\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:117\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:87\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:147\n\tstorj.io/common/sync2.(*Cycle).Start.func1:68\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
2020-03-20T22:43:03.869Z ERROR contact:service ping satellite failed {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “attempts”: 1, “error”: “ping satellite error: rpccompat: dial tcp: lookup satellite.stefan-benten.de on 192.168.1.1:53: read udp 172.17.0.3:48269->192.168.1.1:53: i/o timeout”, “errorVerbose”: “ping satellite error: rpccompat: dial tcp: lookup satellite.stefan-benten.de on 192.168.1.1:53: read udp 172.17.0.3:48269->192.168.1.1:53: i/o timeout\n\tstorj.io/common/rpc.Dialer.dialTransport:256\n\tstorj.io/common/rpc.Dialer.dial:233\n\tstorj.io/common/rpc.Dialer.DialAddressID:152\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:117\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:87\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:147\n\tstorj.io/common/sync2.(*Cycle).Start.func1:68\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}

After reading the exact error, it looks like your node is failing to get a proper reply via DNS for the us-central-1 satellite and Stefan’s testing satellite. What OS are you running and do you have DNS manually configured or is it just pulled from your router via DHCP?

Veddy

PS: @Alexey could you please split this discussion off of the sticky post please? Thanks!

Rasbian and have DNS setup through No IP and DUC running. Node has been running for almost 8 months and dont recall ever seeing the error…

I meant local DNS. Dynamic DNS is great, but this is a failure on your node to resolve the IP addresses associated with satellite hostnames. What DNS servers do you have that particular RPi pointed at? From what it looks like to me, they’re pointed to your router at 192.168.1.1 rather than something like GoogleDNS at 8.8.8.8 and 8.8.4.4. You can try running dig us-central-1.tardigrade.io on your RPi and see if it was just a temporary failure to resolve or if it’s still acting up.

Veddy

Appreciate the assistance on this, although to be honest, I don’t know what you mean by “dig”…could you explain in layman’s terms?

Change your dns to a manually set dns on your router itself so it doesnt use your router as the dns server.

Would a DHCP reservation work just the same for the IP address?

No, Your router is failing to resolve an ip address so its failing to be able to ping it. There should be a dns setting on your router. Or you can change it manually on the rpi itself in the ethernet settings

Okay, I’ll take a look at it then. I just screened through my logs and haven’t seen the error again. The time that it happened earlier today was right after a bandwidth rollup, although I don’t know enough to know if that was related or not.

It could be possible that your router can’t keep up with the dns requests.

Hi,
just wondering if you had an update, I just ran into the same problem with my node (contact:service ping satellite failed) just after a bandwidth rollup. I restarted the node and now it always gives me that error just after startup as well as preflight:localtime unable to get satellite system time
I just setup a noip dns and it’s been running for about an hour. I restarted my router just to double check and it still gives me the same error. It looks like it can’t ping 3 servers (us central, europe west and asia east) but still manages to get traffic from the remaining 3.
Here is the log from when I restarted the node, it keeps trying to ping the servers


I’m running my storagenode on Ubuntu with docker, It’s about 3 weeks old and has been running perfectly until today.

Cheers,
Gab

the satellites are currently under maintenance.
https://status.tardigrade.io/incidents/slz29qswyw1t

Wow thanks a lot @peem that explains it.
I guess I took this storagenode thing a bit too seriously and started freaking out when I saw all these errors hahaha

I’ve tried digging through the forums to answer my issue and this is seems like he best place to post. I’m trying to start a new node on an existing machine and am getting some errors and the node is offline.

What I’ve done:

  • Created a new ID

  • Signed the id with a new auth code

  • confirmed that the ID was signed using the grep commands

  • updated port forwarding and firewall

  • confirmed such using https://www.yougetsignal.com/tools/open-ports/

  • ping google.com runs from the machine with no lost packets

  • created new node on docker using:

      sudo docker run -d --restart unless-stopped --privileged \
      --stop-timeout 300 \
      -p 28966:28967 \
      -p 14001:14002 \
      -e WALLET=<<omitted>> \
      -e EMAIL="<<omitted>>" \
      -e ADDRESS="<<omitted>>:28966" \
      -e STORAGE="500GB" \
      --mount type=bind,source="/mnt/hdd3/identity",destination=/app/identity \
      --mount type=bind,source="/mnt/hdd3/storage",destination=/app/config \
      --name storj-node-03 storjlabs/storagenode:latest
    
  • the UI works but is offline. When I check the logs I see:

    2020-12-04T03:49:07.586Z ERROR contact:service ping satellite failed {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “attempts”: 1, “error”: “ping satellite error: failed to dial storage node (ID: <>) at address electronharvester.ddns.net:28966: rpc: tls peer certificate verification error: tlsopts error: peer ID did not match requested ID”, “errorVerbose”: “ping satellite error: failed to dial storage node (ID:<> ) at address <>.ddns.net:28966: rpc: tls peer certificate verification error: tlsopts error: peer ID did not match requested ID\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:141\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
    2020-12-04T03:49:07.835Z

This feels like a bad ID but I’ve created and signed two ids and keep getting the same issues. What are the next steps to resolve this? For the life of me I can’t see to figure out what is wrong. My other node is unaffected by these issues.

How many nodes are you running on this pc?

How many identity’s do you have currently?

Did you create a new identity and sign the new identity?

I’ve seen a few times where someone would try to resign the old identity’s instead of the new identity but put the files for the new unsigned identity into the node.

I redid everything and the third time was the charm. Idk if I was in a rush before or what. I’m going to make another identity and try ports 28966 and 14001 again I don’t know what else I changed to get it to work.

UPDATE: I figured it out. My router (AT&T’s Arris BGW210-700) has both a port range and base port for the firewall config. I expanded the port range to allow port forwarding on 28966-28968 but left the base port at 28967. Well apparently that makes it overlook 28966.