Node update can cause a FATAL in logs

Hello :slight_smile:

I noticed a high I/O activity on my RPi 4B hosting my nodes, and found out that one of my nodes (v1.13.3) restarted unexpectedly:

...
2020-10-05T19:14:25.367Z        FATAL   Unrecoverable error     {"error": "debug: http: Server closed", "errorVerbose": "debug: http: Server closed\n\tstorj.io/private/debug.(*Server).Run.func2:108\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-10-05T19:14:37.217Z        INFO    Configuration loaded    {"Location": "/app/config/config.yaml"}
...

This happened only once since the beginning of the month (I didn’t check older logs), and it seems the node is simply rescanning all files and resumed its life normally…

Should I be concerned?

Are you sure it didn’t restart to update ?

1 Like

You are right indeed! Hum… :sweat_smile:

Okay then, but still… updates can cause FATALs?!

Yeah, another case of miscategorized log entry…

That is not correct. If the container is forcefully stopped (timeout settings!) then this is indeed a fatal error.

1 Like

iirc it throws a fatal error every time I restarted or shut down my node and it was never forcefully stopped (unless there is a bug in the shutdown process).

1 Like

I did run a test, and in fact I’m experiencing what @kevink described:

  1. Stop de Node gracefully (with a 10 minute timeout):

    $> time docker stop -t 600 storj_node_4
    Stopping 'storj_node_4'...
    storj_node_4
    
    real    0m2.729s
    user    0m0.060s
    sys     0m0.079s
    
  2. Here is the corresponding log:

    ...
    2020-10-06T07:57:47.521Z	INFO	piecestore	upload started	{"Piece ID": "QU3M2AC6WLD4S3REYEEYYQIQ6HJIHJZC7JH2RM45PHRERHUQTOZA", "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "Action": "PUT", "Available Space": 906604904832}
    2020-10-06T07:57:47.603Z	INFO	piecestore	uploaded	{"Piece ID": "QU3M2AC6WLD4S3REYEEYYQIQ6HJIHJZC7JH2RM45PHRERHUQTOZA", "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "Action": "PUT"}
    2020-10-06T07:57:50.134Z	INFO	Got a signal from the OS: "terminated"
    2020-10-06T07:57:50.134Z	INFO	contact:service	context cancelled	{"Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW"}
    2020-10-06T07:57:50.635Z	ERROR	servers	unexpected shutdown of a runner	{"name": "debug", "error": "debug: http: Server closed", "errorVerbose": "debug: http: Server closed\n\tstorj.io/private/debug.(*Server).Run.func2:108\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
    2020-10-06T07:57:51.491Z	FATAL	Unrecoverable error	{"error": "debug: http: Server closed", "errorVerbose": "debug: http: Server closed\n\tstorj.io/private/debug.(*Server).Run.func2:108\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
    

So it seems a FATAL entry does appear even though the Node gets shut down gracefully.
Maybe there’s something to be fixed here?

Besides, if the timeout were triggered (which isn’t the case above), docker would kill the container and the Node storage wouldn’t have the time to issue any FATAL log entry at all, would it?

You can easy fix that - uncomment the debug.addr: option in the config.yaml, specify port (5999 for example)

debug.addr: 127.0.0.1:5999

Save the config and restart the node.
If you use a docker, you can map this port to external to be able to use a debug service

I don’t get how this is related to the fact a fatal error is raised when shutting down a node? :thinking:

Or do you suggest I use monkit to investigate further?

No, just specify a port :slight_smile:
It’s :0 by default, which is invoke an error on stop of the service.

Aah right! :slight_smile: Thx!

But still, the node software could handle that scenario (no port) to avoid a scary fatal error in the logs :wink:

2 Likes

Yes, there is an internal issue in the backlog

2 Likes

I have this in my config.yaml and port rule in docker run:

debug.addr: :7777

I still get a FATAL error on shutdown.

IIRC the port was changed to 5999 after API endpoints were changed.

This config is to specify the port. (which works btw) It doesn’t need to be 5999

I have this set when the guide was updated yet I still see FATAL error when container is stopped gracefully.

It is just should not be 7777 (if I remember correctly it is a default port for the server, if not specified)

So, you have specified a debug port and the line is uncommented and you still have such error on shutdown?

Yes.

Thanks for confirmation!
Updated an issue that workaround doesn’t work anymore.

3 Likes