Download Failed -> GET read: connection timed out

Sometimes downloads from my node fail:

{L:ERROR,T:2023-08-23T08:35:10Z,N:piecestore,M:download failed,process:storagenode,Piece ID:K5J2VNZAR6BYDZOXIOY5DW2JIIIMGPQIGAFZCZVX3GTG2655ADIQ,Satellite ID:12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs,Action:GET,Offset:0,Size:1781760,Remote Address:y.y.y.y:54930,error:manager closed: read tcp x.x.x.x:49419->y.y.y.y:54930: read: connection timed out,errorVerbose:manager closed: read tcp x.x.x.x:49419->y.y.y.y:54930: read: connection timed out
github.com/jtolio/noiseconn.(Conn).readMsg:211
github.com/jtolio/noiseconn.(Conn).Read:171
storj.io/drpc/drpcwire.(Reader).ReadPacketUsing:96
storj.io/drpc/drpcmanager.(Manager).manageReader:226}

Any idea why this happens?

Either your router dropped the connection, or the customer is cutting a long tail:

Thanks for the reply, @Alexey

If the download is cancelled I would not expect a read timeout. Shouldn’t the connection be closed gracefully?

No, it’s intentional

So, the resulting error in the node’s logs may differ, depends on what moment the client canceled the connection.

1 Like

So that I understand what this means: The client does not properly terminate the connection. The server keeps the connection open until it times out.

Isn’t this inperformant from a server point of view when there is a lot of load (with many connections that timeout)? Aren’t connections kept open unnecessarily?

The server in this context is your node. So, it may try to keep it open, however it will be terminated anyway.

did you ever see an issue with preventing to open a new connection to your node? Any connection rejections in your logs? Or may be did you see a full bandwidth usage?
Usually the reason for connection terminating by the client, when your node is unable to deliver data fast enough, not because of too many not yet closed connections.
From the other point - your node cannot be close to everyone customer in the world.

did you ever see an issue with preventing to open a new connection to your node? Any connection rejections in your logs? Or may be did you see a full bandwidth usage?

I only see the read connection errors when there many download-connections. I monitor my node with storj-monitor and there are connection timeouts to my node at the same time. So I would say yes - there are connection problems.

Peak download traffic is not much - only 40 MBit/sec. The node has a 1000/1000 MBit connection - so the networking connection is not saturated. The node has enough ram and cpu and load/iowait is also fine.

In addition to those mentioned read-connection logs I also see a lot (~ 40 logs/sec “download started” and after 1 second ~40 “download cancelled” logs for the same piece id. Same remote adress but with different port). Here are the corresponding log lines (1 example):

{“L”:“INFO”,“T”:“2023-08-24T00:37:01Z”,“N”:“piecestore”,“M”:“download started”,“process”:“storagenode”,“Piece ID”:“U3KA6GYKMJVPDUBWUJKABPHRWAIYJXAD3AMSYAT5OJ6KRV7FBM2A”,“Satellite ID”:“12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”
,“Action”:“GET”,“Offset”:0,“Size”:262144,“Remote Address”:“x.x.x.x:57258”}
{“L”:“INFO”,“T”:“2023-08-24T00:37:02Z”,“N”:“piecestore”,“M”:“download canceled”,“process”:“storagenode”,“Piece ID”:“U3KA6GYKMJVPDUBWUJKABPHRWAIYJXAD3AMSYAT5OJ6KRV7FBM2A”,“Satellite ID”:"12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S
",“Action”:“GET”,“Offset”:0,“Size”:262144,“Remote Address”:“x.x.x.x:57258”}

I just want to double check if this designed behaviour or a bug.

There are two possible options:

  1. Network issues. You will see a lot of errors such as “deadline timeout exceeded”, “no route to host”, “i/o timeout”, etc.
  2. The client closing connections because your node is too slow for them. “download canceled”, “upload canceled”, “context canceled”, etc.

From provided information so far I can assume that these errors are canceled downloads by the customers