ERROR contact:service ping satellite failed?

How bad is this error and what to do do fix it? One of my nodes logs shows mostly this error and it is not getting much traffic.

2022-01-10T11:49:35.572Z ERROR contact:service ping satellite failed {“Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “attempts”: 10, “error”: “ping satellite: rpc: dial tcp: i/o timeout”, “errorVerbose”: “ping satellite: rpc: dial tcp: i/o timeout\n\tstorj.io/common/rpc.TCPConnector.DialContextUnencrypted:114\n\tstorj.io/common/rpc.TCPConnector.DialContext:78\n\tstorj.io/common/rpc.Dialer.dialEncryptedConn:220\n\tstorj.io/common/rpc.Dialer.DialNodeURL.func1:110\n\tstorj.io/common/rpc/rpcpool.(*Pool).get:105\n\tstorj.io/common/rpc/rpcpool.(*Pool).Get:128\n\tstorj.io/common/rpc.Dialer.dialPool:186\n\tstorj.io/common/rpc.Dialer.DialNodeURL:109\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:124\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}

When I stop the container and remove it, then add it again, I get this error:

docker: Error response from daemon: driver failed programming external connectivity on endpoint storagenode (b770b6e195b664bd0751220e12ada24cd831e618c5644941fc81269fc80e37bf): (iptables failed: iptables --wait -t filter -A DOCKER ! -i docker0 -o docker0 -p tcp -d 172.17.0.4 --dport 28967 -j ACCEPT: iptables: No chain/target/match by that name.

!?

hi @svet0slav
This is bad, it means the satellite thinks the storage node is offline. There is some issue with the node, the port, the port forwarding, the router, the internet connection or even DNS.

The second error is a docker/firewall issue locally.

Should be this. Rebooting machine to figure out what it does because resetting firewall did not help.

iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
iptables -t nat -F
iptables -t mangle -F
iptables -F
iptables -X

I don’t understand. It should work. Ports are open.

OK… I see this in the boot logs

systemd-udevd[4611]: veth9c4b829: Failed to get link config: No such device

Why the others related to other docker containers started, but this one not!? This is some systemd problem.

Have them, too, meanwhile ignoring them. Have had no impact on scores so far.

Might has something to do with ping speed / answer speed to the download / upload request:

It is somehow iptables related, but I can’t figure it out, yet. Docker has weird ways of messing with iptables - has it’s own chain.

Any STORJ staff around??

Would appreciate some help from STORJ staff. Nodes are already set up, but docker/iptables problems prevent them from working. How does this affect reputation?

Update…

New error…

2022-01-10T15:26:54.027Z ERROR contact:service ping satellite failed {“Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “attempts”: 9, “error”: “ping satellite: failed to dial storage node (ID: XXX) at address node3.domain.com:28969: rpc: context deadline exceeded”, “errorVerbose”: “ping satellite: failed to dial storage node (ID: XXX) at address node3.domain.com:28969: rpc: context deadline exceeded\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:141\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}

Ports are open. Why is it doing this?

One more error popping up…>

2022-01-10T15:31:30.610Z ERROR contact:service ping satellite failed {“Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “attempts”: 2, “error”: “ping satellite: check-in ratelimit: node rate limited by id”, “errorVerbose”: “ping satellite: check-in ratelimit: node rate limited by id\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:138\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}

This looks like the node still can’t be reached by the satellites. You could try this community built tool to ping the nodes, since it will try to initiate an RPC to the node to tell if the node itself is responding, whereas a simple ping toll will only tell you if the port is open and something is responding.

https://storjnet.info/ping_my_node

2 Likes

ping…

  • started
  • QUIC: dialed node in 41ms
  • QUIC: pinged node in 37ms
  • QUIC: total: 78ms
  • TCP: couldn’t connect to node: rpc: context deadline exceeded

dial…

  • started
  • QUIC: dialed node in 44ms
  • TCP: couldn’t connect to node: rpc: context deadline exceeded

Looks like UDP forwarding is working but TCP forwarding is not.

1 Like

Hmm… My start command looks like this:

docker run -d --restart unless-stopped --stop-timeout 300 -p 11.22.33.44:28969:28967/tcp -p 11.22.33.44:28969:28967/udp -p 11.22.33.44:14002:14002 -e WALLET="XXX" -e EMAIL="mail@domain.com" -e ADDRESS="node3.domain.com:28969" -e STORAGE="14TB" --mount type=bind,source="/home/user/.local/share/storj/identity/node3",destination=/app/identity --mount type=bind,source="/storage/STORJ/node3",destination=/app/config --name node3 storjlabs/storagenode:latest

Ports are open, but docker does not want to let the container connect to the outside world…

CONTAINER ID   IMAGE                          COMMAND                  CREATED          STATUS          PORTS                                                                                                  NAMES
49fd73d0b1b5   storjlabs/storagenode:latest   "/entrypoint"            38 minutes ago   Up 28 minutes   11.22.33.44:14002->14002/tcp, 11.22.33.44:28969->28967/tcp, 11.22.33.44:28969->28967/udp   node3

Other nodes on the same machine configured in the very same way, just different port/subnet/ip,nodename, address, folder… Checkmate!

This is what bothers me the most. Why does it even happen?

Is it at all possible that TCP port 28969 is already being used by something else on the network? At this point I would try using some other high number port just to see what happens.

I appreciate you are trying to help, but this is not it. I checked with lsof already. I even tried setting up docker networks, but when I create the docker containers with specific network and ip, it says the address is in use, BUT IT IS NOT. I tried many things to no avail. What seems to work with some nodes is to specify the IP of node directly, not just port.

Like so

Instead of

docker run -d --restart unless-stopped --stop-timeout 300 -p 28969:28967/tcp -p 28969:28967/udp -p 14002:14002

Pretty sure this issue is some docker ↔ iptables thing, but not sure how docker screws up with iptables. No matter I completely disable firewall, issue persists across reboots of machine with multiple nodes. Some nodes start, some not - at random.

1 Like

I was just about to say that you don’t normally need to specify the IP of the node in the -p parameter of the docker run command. My understanding of iptables/local networking is superficial at best, but I think by specifying the node IP you are telling the node to bind those ports to the loopback interface instead of the external interface?

I am very confused. I thought I understood Linux networking before I met docker… :rofl:

4 network card ports - 4 different IPs each on different /24. All IPs are pingable from the outside world. Ports are open. This is some docker/iptables thing. Docker does not let the container to communicate with the outside world, no matter the machine network and config allow it.

No firewall enabled. When I enable firewall - nothing happens. Nodes work, but not this node.
When I disable firewall, can’t start this node because this deletes the docker chain…

docker: Error response from daemon: driver failed programming external connectivity on endpoint storagenode (3ceabf1abd47cc2585296527a009f8b38dcf3c81a7293d30bb6b9936b3f4aabd): (iptables failed: iptables --wait -t filter -A DOCKER ! -i docker0 -o docker0 -p tcp -d 172.17.0.6 --dport 28967 -j ACCEPT: iptables: No chain/target/match by that name.

I restart the server and then some random nodes work, some not - same issue, but on other nodes. Do not know what to say, except it points me to docker’s iptables screwups.

veth… is the Docker network interface

You can check which interfaces are listed on the host machine by:

ip a |grep ^[0-9]:

There should be a veth corresponding to docker.

If there is not, maybe docker isn’t running.

You can see if docker is running and what is running in docker by:

docker ps

which should show the storagenode and watchtower and whatever other images you happen to be running.

You can check your iptables for docker configuration by:

iptables -L

which should give you several listings related to docker.


All the above assumes you’ve installed docker in some sort of normal way and are root on the machine.

2 Likes

I know. The fun thing, which happens is…

Docker is running. Node is running, IP pingable from outside world, ports open, docker container running, but cannot communicate with the outside world.

I even tried adding the vethXXXXXXX manually and bringing it up to no avail. :crazy_face:

I am reinstalling the system and creating the nodes one by one again, to see what happens…

The veth… interface is brought up and down by the docker daemon.

If you stop the docker daemon the veth… interface drops off the machine list, and the docker0 interface is listed as down.

You can check which containers are connect to which docker interface by:

docker network ls

should list the network interfaces.

docker network bridge

should list what containers are connected to what docker addresses.