Node update can cause a FATAL in logs

Pac · October 5, 2020, 9:21pm

Hello

I noticed a high I/O activity on my RPi 4B hosting my nodes, and found out that one of my nodes (v1.13.3) restarted unexpectedly:

...
2020-10-05T19:14:25.367Z        FATAL   Unrecoverable error     {"error": "debug: http: Server closed", "errorVerbose": "debug: http: Server closed\n\tstorj.io/private/debug.(*Server).Run.func2:108\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-10-05T19:14:37.217Z        INFO    Configuration loaded    {"Location": "/app/config/config.yaml"}
...

This happened only once since the beginning of the month (I didn’t check older logs), and it seems the node is simply rescanning all files and resumed its life normally…

Should I be concerned?

nerdatwork · October 5, 2020, 9:26pm

Are you sure it didn’t restart to update ?

Pac · October 5, 2020, 9:42pm

You are right indeed! Hum…

Okay then, but still… updates can cause FATALs?!

Toyoo · October 5, 2020, 9:54pm

Yeah, another case of miscategorized log entry…

stefanbenten · October 5, 2020, 10:07pm

That is not correct. If the container is forcefully stopped (timeout settings!) then this is indeed a fatal error.

kevink · October 6, 2020, 7:12am

iirc it throws a fatal error every time I restarted or shut down my node and it was never forcefully stopped (unless there is a bug in the shutdown process).

Pac · October 6, 2020, 8:18am

I did run a test, and in fact I’m experiencing what @kevink described:

Stop de Node gracefully (with a 10 minute timeout):

$> time docker stop -t 600 storj_node_4
Stopping 'storj_node_4'...
storj_node_4

real    0m2.729s
user    0m0.060s
sys     0m0.079s

Here is the corresponding log:

...
2020-10-06T07:57:47.521Z	INFO	piecestore	upload started	{"Piece ID": "QU3M2AC6WLD4S3REYEEYYQIQ6HJIHJZC7JH2RM45PHRERHUQTOZA", "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "Action": "PUT", "Available Space": 906604904832}
2020-10-06T07:57:47.603Z	INFO	piecestore	uploaded	{"Piece ID": "QU3M2AC6WLD4S3REYEEYYQIQ6HJIHJZC7JH2RM45PHRERHUQTOZA", "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "Action": "PUT"}
2020-10-06T07:57:50.134Z	INFO	Got a signal from the OS: "terminated"
2020-10-06T07:57:50.134Z	INFO	contact:service	context cancelled	{"Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW"}
2020-10-06T07:57:50.635Z	ERROR	servers	unexpected shutdown of a runner	{"name": "debug", "error": "debug: http: Server closed", "errorVerbose": "debug: http: Server closed\n\tstorj.io/private/debug.(*Server).Run.func2:108\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-10-06T07:57:51.491Z	FATAL	Unrecoverable error	{"error": "debug: http: Server closed", "errorVerbose": "debug: http: Server closed\n\tstorj.io/private/debug.(*Server).Run.func2:108\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

So it seems a FATAL entry does appear even though the Node gets shut down gracefully.
Maybe there’s something to be fixed here?

Besides, if the timeout were triggered (which isn’t the case above), docker would kill the container and the Node storage wouldn’t have the time to issue any FATAL log entry at all, would it?

Alexey · October 6, 2020, 9:13pm

You can easy fix that - uncomment the debug.addr: option in the config.yaml, specify port (5999 for example)

debug.addr: 127.0.0.1:5999

Save the config and restart the node.
If you use a docker, you can map this port to external to be able to use a debug service

Pac · October 6, 2020, 9:18pm

I don’t get how this is related to the fact a fatal error is raised when shutting down a node?

Or do you suggest I use monkit to investigate further?

Alexey · October 6, 2020, 9:21pm

No, just specify a port
It’s :0 by default, which is invoke an error on stop of the service.

Pac · October 6, 2020, 9:29pm

Aah right! Thx!

But still, the node software could handle that scenario (no port) to avoid a scary fatal error in the logs

Alexey · October 6, 2020, 9:56pm

Yes, there is an internal issue in the backlog

kevink · October 7, 2020, 4:51am

I have this in my config.yaml and port rule in docker run:

debug.addr: :7777

I still get a FATAL error on shutdown.

nerdatwork · October 7, 2020, 4:54am

IIRC the port was changed to 5999 after API endpoints were changed.

kevink · October 7, 2020, 4:56am

This config is to specify the port. (which works btw) It doesn’t need to be 5999

nerdatwork · October 7, 2020, 4:59am

I have this set when the guide was updated yet I still see FATAL error when container is stopped gracefully.

Alexey · October 7, 2020, 6:44am

It is just should not be 7777 (if I remember correctly it is a default port for the server, if not specified)

Alexey · October 7, 2020, 6:46am

So, you have specified a debug port and the line is uncommented and you still have such error on shutdown?

nerdatwork · October 7, 2020, 7:47am

Yes.

Alexey · October 7, 2020, 8:23am

Thanks for confirmation!
Updated an issue that workaround doesn’t work anymore.