I noticed a high I/O activity on my RPi 4B hosting my nodes, and found out that one of my nodes (v1.13.3) restarted unexpectedly:
...
2020-10-05T19:14:25.367Z FATAL Unrecoverable error {"error": "debug: http: Server closed", "errorVerbose": "debug: http: Server closed\n\tstorj.io/private/debug.(*Server).Run.func2:108\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-10-05T19:14:37.217Z INFO Configuration loaded {"Location": "/app/config/config.yaml"}
...
This happened only once since the beginning of the month (I didn’t check older logs), and it seems the node is simply rescanning all files and resumed its life normally…
iirc it throws a fatal error every time I restarted or shut down my node and it was never forcefully stopped (unless there is a bug in the shutdown process).
I did run a test, and in fact I’m experiencing what @kevink described:
Stop de Node gracefully (with a 10 minute timeout):
$> time docker stop -t 600 storj_node_4
Stopping 'storj_node_4'...
storj_node_4
real 0m2.729s
user 0m0.060s
sys 0m0.079s
Here is the corresponding log:
...
2020-10-06T07:57:47.521Z INFO piecestore upload started {"Piece ID": "QU3M2AC6WLD4S3REYEEYYQIQ6HJIHJZC7JH2RM45PHRERHUQTOZA", "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "Action": "PUT", "Available Space": 906604904832}
2020-10-06T07:57:47.603Z INFO piecestore uploaded {"Piece ID": "QU3M2AC6WLD4S3REYEEYYQIQ6HJIHJZC7JH2RM45PHRERHUQTOZA", "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "Action": "PUT"}
2020-10-06T07:57:50.134Z INFO Got a signal from the OS: "terminated"
2020-10-06T07:57:50.134Z INFO contact:service context cancelled {"Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW"}
2020-10-06T07:57:50.635Z ERROR servers unexpected shutdown of a runner {"name": "debug", "error": "debug: http: Server closed", "errorVerbose": "debug: http: Server closed\n\tstorj.io/private/debug.(*Server).Run.func2:108\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-10-06T07:57:51.491Z FATAL Unrecoverable error {"error": "debug: http: Server closed", "errorVerbose": "debug: http: Server closed\n\tstorj.io/private/debug.(*Server).Run.func2:108\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
So it seems a FATAL entry does appear even though the Node gets shut down gracefully.
Maybe there’s something to be fixed here?
Besides, if the timeout were triggered (which isn’t the case above), docker would kill the container and the Node storage wouldn’t have the time to issue any FATAL log entry at all, would it?