Node says offline - Last contact 2000 years ago

robhodl · November 4, 2020, 3:26pm

Pretty much the title.

My node was disqualified from a few sats a few months back. Everything was running fine for the other sats until this weekend it says “Offline”.

On the web dashboard, the last contact says “17705582h 41m ago” which translates to 2021 years. So I’m thinking theres an issue with clock or something, would this be a reason for the node to be offline?

I’m running a docker-compose setup on a 8TB drive. Any thoughts?

Edit 1: Heres some snippets from the logs that indicate there may be a problem:

2020-11-02T14:12:34.715Z	INFO	Public server started on [::]:28967
2020-11-02T14:12:34.715Z	INFO	Private server started on 127.0.0.1:7778
2020-11-02T14:14:13.619Z	INFO	Got a signal from the OS: "terminated"
2020-11-02T14:14:13.667Z	ERROR	piecestore:cache	error getting current used space: 	{"error": "context canceled; context canceled; context canceled", "errorVerbose": "group:\n--- context canceled\n--- context canceled\n--- context canceled"}
2020-11-02T14:14:14.119Z	ERROR	servers	unexpected shutdown of a runner	{"name": "debug", "error": "debug: http: Server closed", "errorVerbose": "debug: http: Server closed\n\tstorj.io/private/debug.(*Server).Run.func2:108\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
Error: debug: http: Server closed

2020-11-04T02:22:03.627Z	WARN	console:service	unable to get Satellite URL	{"Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "error": "storage node dashboard service error: trust: satellite \"118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW\" is untrusted", "errorVerbose": "storage node dashboard service error: trust: satellite \"118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW\" is untrusted\n\tstorj.io/storj/storagenode/trust.(*Pool).getInfo:228\n\tstorj.io/storj/storagenode/trust.(*Pool).GetNodeURL:167\n\tstorj.io/storj/storagenode/console.(*Service).GetDashboardData:168\n\tstorj.io/storj/storagenode/console/consoleapi.(*StorageNode).StorageNode:44\n\tnet/http.HandlerFunc.ServeHTTP:2042\n\tgithub.com/gorilla/mux.(*Router).ServeHTTP:210\n\tnet/http.serverHandler.ServeHTTP:2843\n\tnet/http.(*conn).serve:1925"}

Edit 2: Disk is mounted and data seems to be there. I am able to write to disk as well.

*Edit 3: I am running many nodes on the same system. the other nodes are working fine. Just one has this problem.

deathlessdd · November 4, 2020, 3:30pm

First thing you should check are the logs to see if it gives any idea of why.

robhodl · November 4, 2020, 3:52pm

I added some error logs in the OP

nerdatwork · November 4, 2020, 3:59pm

Check your disk for errors.

robhodl · November 4, 2020, 4:02pm

Disk is mounted and I can read/write. Anything else to check?

nerdatwork · November 4, 2020, 4:05pm

You should check it with fsck. The filesystem could have marked free space as allocated.

robhodl · November 4, 2020, 4:29pm

fsck found no errors on the partition.

$ sudo fsck -f /dev/sdb1
fsck from util-linux 2.34
e2fsck 1.45.5 (07-Jan-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
DISK LABEL: 4464220/244191232 files (2.5% non-contiguous), 1710702416/1953506304 blocks

nerdatwork · November 4, 2020, 4:39pm

What is your docker version ?

robhodl · November 4, 2020, 4:48pm

Docker version 19.03.13, build 4484c46d9d
I should also note that I am running many nodes on the same system. Only one node has this problem.

stuberman · November 5, 2020, 4:25am

Yeah, I hate when I oversleep, too…

Alexey · November 5, 2020, 7:58am

Please, check the port for that specific node and make sure that it’s not intersected with others.
Also, please, check that your firewall includes the port of that node to allow incoming traffic. The outgoing traffic should not be blocked.
Also, I would recommend to check the identity of that node on any case.

It’s worth to check the connection to HDD, because if it’s “blink” for a moment, the storagenode will crash to prevent a disqualification.

robhodl · November 5, 2020, 4:35pm

This was the culprit! I can’t believe I didn’t check. I guess that “Last Contact date” threw me off, there was no reason the port would suddenly be closed… weird.

Node is back online and sending/storing pieces. Hopefully, reputation didn’t get hit too hard. Thanks for the help!