Node Offline... no idea why

itnok · August 31, 2020, 1:00pm

Yesterday I had to take down my node unexpectedly for about 2h.
In the meantime had to change DNS and updated the Docker container to version 1.10.1.
All operations went pretty smoothly, but the restart my node is showing as offline.

The tail of the log looks like this:

2020-08-31T07:41:14.172Z        INFO    Configuration loaded    {"Location": "/app/config/config.yaml"}
2020-08-31T07:41:14.194Z        INFO    Operator email  {"Address": "<redacted>"}
2020-08-31T07:41:14.194Z        INFO    Operator wallet {"Address": "0x<redacted>"}
2020-08-31T07:41:15.486Z        INFO    Telemetry enabled
2020-08-31T07:41:15.630Z        INFO    db.migration    Database Version      {"version": 43}
2020-08-31T07:41:15.919Z        INFO    preflight:localtime     start checking local system clock with trusted satellites' system clock.
2020-08-31T07:41:17.204Z        INFO    preflight:localtime     local system clock is in sync with trusted satellites' system clock.
2020-08-31T07:41:17.204Z        INFO    trust   Scheduling next refresh {"after": "8h23m36.709841089s"}
2020-08-31T07:41:17.205Z        INFO    Node <redacted> started
2020-08-31T07:41:17.205Z        INFO    Public server started on [::]:28967
2020-08-31T07:41:17.205Z        INFO    Private server started on 127.0.0.1:7778
2020-08-31T07:41:17.205Z        INFO    bandwidth       Performing bandwidth usage rollups
2020-08-31T08:41:17.383Z        INFO    bandwidth       Performing bandwidth usage rollups
2020-08-31T09:41:17.282Z        INFO    bandwidth       Performing bandwidth usage rollups
2020-08-31T10:41:17.267Z        INFO    bandwidth       Performing bandwidth usage rollups
2020-08-31T11:41:17.205Z        INFO    bandwidth       Performing bandwidth usage rollups
2020-08-31T12:41:17.225Z        INFO    bandwidth       Performing bandwidth usage rollups

From the dashboard it looks like my last contact happened a little “too long ago”:

Any help would be greatly appreciated…

Thanks!

Vadim · August 31, 2020, 1:03pm

firewall on windows/linux, port forwarding? do you have fix ip or ddns? if ddns did you instaled update tool or configured it to router?

nerdatwork · August 31, 2020, 1:10pm

This should help and do update this thread.

itnok · August 31, 2020, 1:31pm

Port 28967 is open (checked with suggested link)
Fixed IP and custom DNS. DNS correctly poi t to my IP. Firewall did not change, and was working before.

Not really sure where/what to look for more than this…

nerdatwork · August 31, 2020, 1:39pm

Check path to identity and the folder should have 6 files in it.

deathlessdd · August 31, 2020, 1:50pm

Do you have more then one node?

itnok · August 31, 2020, 5:36pm

I have one sigle node

immiq · August 31, 2020, 10:16pm

This just happened to one of my nodes. I took it down for a couple of hours and when I brought it back online the dashboard reports that is is offline.

https://www.yougetsignal.com/ reports the port as opened.

Other nodes on the same network are operational.

Alexey · August 31, 2020, 10:46pm

Please, check your identity: https://documentation.storj.io/dependencies/identity#confirm-the-identity

immiq · August 31, 2020, 11:05pm

It seems things are fine on that end …

grep -c BEGIN ca.cert returns 2

grep -c BEGIN identity.cert returns 3

immiq · August 31, 2020, 11:07pm

These are the logs:

2020-08-31T22:23:19.865Z	INFO	Configuration loaded	{"Location": "/app/config/config.yaml"}
2020-08-31T22:23:19.880Z	INFO	Operator email	{"Address": "REDACTED"}
2020-08-31T22:23:19.880Z	INFO	Operator wallet	{"Address": "REDACTED"}
2020-08-31T22:23:20.204Z	INFO	Telemetry enabled
2020-08-31T22:23:20.217Z	INFO	db.migration	Database Version	{"version": 43}
2020-08-31T22:23:20.824Z	INFO	preflight:localtime	start checking local system clock with trusted satellites' system clock.
2020-08-31T22:23:21.391Z	INFO	preflight:localtime	local system clock is in sync with trusted satellites' system clock.
2020-08-31T22:23:21.391Z	INFO	bandwidth	Performing bandwidth usage rollups
2020-08-31T22:23:21.391Z	INFO	trust	Scheduling next refresh	{"after": "5h7m57.232603171s"}
2020-08-31T22:23:21.392Z	INFO	Node [REDACTED] started
2020-08-31T22:23:21.392Z	INFO	Public server started on [::]:28967
2020-08-31T22:23:21.392Z	INFO	Private server started on 127.0.0.1:7778

Also:

Suspension Score = 100%
Audit Score = 100%

immiq · August 31, 2020, 11:09pm

One thing to note, is that this node was moved from an arm device to amd64, though I don’t think thats an issue as I have done the same in the past.

Update: moved the hard drive back to the arm device and the node came back online.

Alexey · September 1, 2020, 8:19am

If you moved the node but didn’t change the port forwarding rule it will be an issue - each device have an own local IP, so you should update the rule too.

immiq · September 1, 2020, 1:39pm

I did update the port and created a new forwarding rule.

Alexey · September 1, 2020, 6:43pm

Then please check what else in the network configuration is different. Perhaps you have an integrated firewall on the second system. Then you should create an inbound rule for the port. You also need to have a granted outbound access too.
The second thing - when you move the disk with data have you moved the tied identity too?

itnok · September 2, 2020, 7:42pm

@Alexey for my case (my node is still OFFLINE) after several days despite being connected and the docker container being up and running.

Checked the identity as sugested and can confirm is good (returns 2 & 3 as expected)
Checked the firewall/router port forward both manually and via https://www.yougetsignal.com/tools/open-ports/?port=28967 as suggested (have fixed IP and dealt manually with DNS zone… no DynDNS)
The node was moved to a new location, but the ISP is the same (basically just different IP… should not be an issue)
The node has always being running Dockerized and on the same hardware (just shut it down, moved and fired back up)
Nothing else in the network infrastructure changed (same router, same firewall, same everything… only port forward had to be configured again on the new ISP modem/router… but it does work)
Other people (or myself for what that matters can access my infrastructure from remote, outside on my LAN)

Do you have any further suggestion to avoid for me any unnecessary downtime or for keeping being penalized for… no reason?

Any suggestion would be greatly appreciated…

Thanks

Alexey · September 2, 2020, 7:49pm

Can you try to shutdown the node and check your port? Is it closed?

itnok · September 2, 2020, 8:27pm

Good point @Alexey, and the answer is yes, the port shows as closed using https://www.yougetsignal.com/tools/open-ports/?port=28967 when the Docker container is down.

Bringing the node back up, did not change anything though:

Port now shows as open (of course)
Node is still reported as OFFLINE

Alexey · September 2, 2020, 8:45pm

Please, try a different browser.
Also, your ISP could apply a filter to your traffic
Please, take a look:

itnok · September 2, 2020, 9:00pm

Am not sure am following you. What do you mean try another browser?

Am checking the node directly from the CLI, no browser is involved:

Storage Node Dashboard ( Node Version: v1.10.1 )

======================

ID           <redacted>
Last Contact OFFLINE
Uptime       30m49s

                   Available       Used     Egress     Ingress
     Bandwidth           N/A        0 B        0 B         0 B (since Sep 1)
          Disk        2.9 TB     6.1 TB
Internal 127.0.0.1:7778
External <redacted>:28967

My data is uncapped and have FTTH 1Gbps/1Gbps (at current measurement 995Mbps/975Mbps).