Are these regular download errors?

I don’t see canceled with them:
2020-03-20T02:54:47.757Z ERROR piecestore download failed {“Piece ID”: “2BGJWGHGAW7PA3REMY2J53TDONBU3YNBJS4C5OJ6ZYDWF34A3WVA”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Action”: “GET”, “error”: “write tcp 172.17.0.2:28967->34.95.8.124:59940: write: connection reset by peer”, “errorVerbose”: “write tcp 172.17.0.2:28967->34.95.8.124:59940: write: connection reset by peer\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:221\n\tstorj.io/drpc/drpcwire.SplitN:29\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:276\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:318\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1079\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload.func5.1:648\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22”}

What version are you running on?

The version is v0.34.6.

I would have expected this one to be a download cancelled on that version. Either way, this doesn’t seem like anything to worry about.

There are too many of them to not to worry.
If the log is correct it starts on March 19 and there are 4366 occurrences of failed downloads since then seemingly with one of those errors above.

Well that may suggest some connection issues. I’d start by giving everything a reboot. Node hardware, routers, switches, modems. I hope you’re not on wifi, but if so, connect a cable and make sure they’re all plugged in well.

I am not sure but I believe it started with 0.34.6.
I guess I will wait for 0.35.5 before I will further investigate.

No, I am not on a wifi and everything else is working.
This really kills my download rate:

========== DOWNLOAD ===========

Failed: 1177
Fail Rate: 11.397%
Canceled: 11
Cancel Rate: 0.107%
Successful: 9139
Success Rate: 88.496%

My failed rate is 100 fold the cancel rate. i don’t think it should be this way.

When I lookup the error response use of closed network connection then Google points to some Golang specific issues.

I wish some Storj folks would look into this as I am not sure if the issue is server/connection specific on my side or if there is some application issue. Or maybe even Docker related. I have no idea.

Edit: I am not even sure if it is ip related. It seems that certain ips come up many times with download failed error in the log, e.g.
34.95.11.198 -> 94 times
34.83.3.52 -> 9 times
34.95.45.148 -> 107 times

As far as I see the log does not contain ip information for successful downloads. So I cannot tell I there are any successful downloads for these ips at all.

How is your HDD connected to the PC?

Internal Sata port connection.

Then there is only location playing role.
Is your node vetted?

Yes the node is vetted.

Here is my actual download stats for the last 40 hours:

Failed:                2186
Fail Rate:             15.664%
Canceled:              14
Cancel Rate:           0.100%
Successful:            11756
Success Rate:          84.236%

First of all it is required to understand what is the difference between failed download and canceled download. I understand it that way that a canceled download is a lost race due to speed.
So the huge number of failed downloads seem to be something entirely different.

What kind of location thing could result in 2186 failed downloads and which are not lost races?
In numbers these are 15% of paid downloads that fail that’s why I need to know why.

Is there a way to get more debug information about the failed downloads?

I do have 5% of failed downloads, 16% cancelled.
Not entirely sure what the difference is either.
But looking at my logs, a failed download occurs when the peer connection is reset… Maybe that means that the client aborted the download while cancelled errors mean I have lost the race?

Currently I do not understand the meaning of the errors. I am totally fine with them if it is like you say, when the client is not happy and leaves.
But if these errors happen because my node is not working properly then I would like to know what the issue is and solve it.
By the way 15% in my case sounds really really high.

It may not really be comparable but my failed upload rate is 0,046%

I’m seeing similar errors appear from time to time.

The difference between cancelled and failed is very simple. Only context cancelled errors are now shown as cancelled. Unfortunately those aren’t the only ones that can occur because of “losing the race”. They are the most common ones though. Now I’m not entirely sure that these errors result from losing the race as well, but that is my estimate right now.

All mentioned errors are related to connectivity between your node and customer. It can interrupt downloads abruptly, not just cancel during the long tail cut.
I think you nothing can do about it.

After updating to version 1.0.1 today, I can see a significan decrease of failed downloads:

========== DOWNLOAD ===========
Failed: 56
Fail Rate: 2.451%
Canceled: 3
Cancel Rate: 0.131%
Successful: 2226
Success Rate: 97.418%

It seems that “something” definitely was not right when the previous version was running.

I just checked with my second node which is in a data center. It is showing the same errors:

"Apr 1 10:01:45 storagenode[13282]: 2020-04-01T10:01:45.827+0200#011#033[34mINFO#033[0m#011piecestore#011download failed#011{“Piece ID”: “J7JEDM5M5UULSBJJWQL7F7OGZO4PV4FYLOCUB7XBTQCNTWFMYPHA”, “Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Action”: “GET”, “error”: “write tcp ip:28967->95.217.158.23:51092: use of closed network connection”, “errorVerbose”: "write tcp ip:28967->95.217.158.23:51092: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:221\n\tstorj.io/drpc/drpcwire.SplitN:29\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:276\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:318\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1168\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload.func5.1:667\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22"}"

I just updated it to the new version so maybe the error will not show up again.

Yes, they’ve been showing up on my node as well for a little over 2% of the downloads. I honestly think this is another error that should probably be an INFO log level and “download canceled” instead of “download failed”. As it seems to result from the uplink closing the connection as well.