Strange error, what is that mean?

Hello.

Does anyone know what this Error mean, it turn off node by the way.

2021-09-12T01:34:53.645+0300 ERROR servers unexpected shutdown of a runner {“name”: “server”, “error”: “read udp [::]:32000: wsarecvfrom: The connection has been broken due to keep-alive activity detecting a failure while the operation was in progress.”, “errorVerbose”: “read udp [::]:32000: wsarecvfrom: The connection has been broken due to keep-alive activity detecting a failure while the operation was in progress.\n\tstorj.io/drpc/drpcserver.(*Server).Serve:88\n\tstorj.io/storj/private/server.(*Server).Run.func5:224\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
2021-09-12T01:34:54.213+0300 FATAL Unrecoverable error {“error”: “read udp [::]:32000: wsarecvfrom: The connection has been broken due to keep-alive activity detecting a failure while the operation was in progress.”, “errorVerbose”: “read udp [::]:32000: wsarecvfrom: The connection has been broken due to keep-alive activity detecting a failure while the operation was in progress.\n\tstorj.io/drpc/drpcserver.(*Server).Serve:88\n\tstorj.io/storj/private/server.(*Server).Run.func5:224\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}

Hi Vadim,

Thanks for bringing this to our attention. The error looks mostly harmless; it’s just indicating that there was a failure on a QUIC connection. But yes, it’s clearly wrong that that is being treated as a fatal error and crashing the process. We’ll look into it!

8 Likes

I had this same error several times. I set the Storj Windows service to automatically restart when it fails as a workaround, but it hasn’t happened to me in a few days.

It appears that this is an issue that occurs when a Windows storage node starts to accept a QUIC (UDP-based) connection, but the connection fails partway through. The accept() system call returns WSAENETRESET, which quic-go doesn’t recognize as a temporary error, so our code thinks that it is no longer able to accept any connections. The runner handling QUIC connections shuts down, which causes the whole node to die.

We have a fix pending for DRPC which makes it treat WSAENETRESET (and a few other errors) as temporary, so that it doesn’t all crash.

3 Likes