Error piercestore protocol: rpc error: code = canceled desc = context canceled

Screechingcrow · August 2, 2019, 7:54am

Hey all,

I regularly get the error attached. Is it a problem?

Here is the first line.
storj.io/storj/storagenode/piecestore.{*Endpoint).Upload:283

Alexey · August 2, 2019, 8:04am

The upload failed (context canceled) errors aren’t actually errors, it’s part of how the system works. Every upload is cut up into 130 erasure encoded pieces of which 29 are needed to recreate the data. Uploads stop after 80 of those pieces have been uploaded successfully. The other 50 nodes who were the slowest for that transfer will see the upload failed error in their logs.

The same for downloads, but the 50 nodes got selected initially and 21 from them got canceled when 29 would be finished first.

We are still in the process of adjusting these figures, so the frequency that you run into these “context canceled” notices may vary. However, if you would like to reduce how often you don’t manage to win the race to be among the first nodes to upload the piece, you would need to look into getting a better bandwidth from your ISP.

Please, read other codes from the topic: Error Codes: What they mean and Severity Level [READ FIRST]

bodiaSBK · July 27, 2019, 12:54pm

hello) i am a new node operator, and today i see ERROR in log file. Can you tell me what it’s mean, and what i must do to fix it?

 `2019-07-27T12:20:48.559Z        INFO    piecestore      upload started  {"Piece ID": "G35E5SFMRHE575ESJJR44G2F6YSR5UWKQNX7BYAS5SG5NAGX3Y2A", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "PUT"}
2019-07-27T12:20:52.141Z        INFO    piecestore      upload started  {"Piece ID": "3CO2AWB7AQOIVKBSBZVA34ZBEFY3TN3QIAYA6TY2F45JAVI7LA4A", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "PUT"}
2019-07-27T12:20:52.568Z        INFO    piecestore      upload failed   {"Piece ID": "G35E5SFMRHE575ESJJR44G2F6YSR5UWKQNX7BYAS5SG5NAGX3Y2A", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "PUT", "error": "piecestore protocol: rpc error: code = Canceled desc = context canceled", "errorVerbose": "piecestore protocol: rpc error: code = Canceled desc = context canceled\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:238\n\tstorj . io/storj/pkg/pb._Piecestore_Upload_Handler:701\n\tstorj.io/storj/pkg/server.logOnErrorStreamInterceptor:23\n\tgoogle.golang.org/grpc.(*Server).processStreamingRPC:1209\n\tgoogle.golang.org/grpc.(*Server).handleStream:1282\n\tgoogle.golang.org/grpc.(*Server).serveStreams.func1.1:717"}
2019-07-27T12:20:52.568Z        ERROR   piecestore protocol: rpc error: code = Canceled desc = context canceled
        storj.io/storj/storagenode/piecestore.(*Endpoint).Upload:238
        storj.io/storj/pkg/pb._Piecestore_Upload_Handler:701
        storj.io/storj/pkg/server.logOnErrorStreamInterceptor:23
        google.golang.org/grpc.(*Server).processStreamingRPC:1209
        google.golang.org/grpc.(*Server).handleStream:1282
        google.golang.org/grpc.(*Server).serveStreams.func1.1:717`

Alexey · July 27, 2019, 1:27pm

Hello @bodiaSBK,
Welcome to the forum!

The upload failed (context canceled) errors aren’t actually errors, it’s part of how the system works. Every upload is cut up into 130 erasure encoded pieces of which 29 are needed to recreate the data. Uploads stop after 80 of those pieces have been uploaded successfully. The other 50 nodes who were the slowest for that transfer will see the upload failed error in their logs.

We are still in the process of adjusting these figures, so the frequency that you run into these “context canceled” notices may vary. However, if you would like to reduce how often you don’t manage to win the race to be among the first nodes to upload the piece, you would need to look into getting a better bandwidth from your ISP.

bodiaSBK · July 27, 2019, 2:37pm

Thx, @Alexey, it’s make my reputation lower? no better bandwidth) 1gbps, it’s only distance i think…
thx u

Alexey · July 27, 2019, 2:41pm

No. Just your node will get less pieces from those customers.
The data is distributes closely to the customer.

bodiaSBK · July 27, 2019, 2:44pm

@Alexey
Well, I realized that, in theory, those who are closer to me will be more willing to upload data.
And as I understand it, the low channel load ( <= 1mb) is connected with the fact that clients must actually use the network for downloading in order for you to receive the task.

Alexey · July 27, 2019, 2:51pm

Yes, the usage is coming from customers.
In the alpha stage our customers are developers and testers, they performs tests when they needed and how much they needed. There is could be no constantly flowing traffic.
Though it could be a similar behavior in the production too.

Lipora · July 27, 2019, 2:54pm

@Alexey How can you tell if the volume of these is Excessive? I can go through my logs and I see this quite frequently. Is there something of a stat we might see on SNOBoard as it comes live that will help us see if we have a network problem? In proper due course how frequent would we expect this in the logs?

Alexey · July 27, 2019, 2:56pm

I don’t think that this information would be on SNOboard. But we will see.
There is no “normal” count of errors. But you can calculate the percent by Script: Calculate Success Rates for Audit, Download, Upload, Repair

Lipora · July 27, 2019, 3:36pm

Setup that script. That’s awesome. Thanks for that. The results don’t look promising LOL

========== AUDIT =============
Successful: 10
Recoverable failed: 0
Unrecoverable failed: 0
Success Rate Min: 100.000%
Success Rate Max: 100.000%
========== DOWNLOAD ==========
Successful: 35
Failed: 2
Success Rate: 94.595%
========== UPLOAD ============
Successful: 66
Rejected: 25
Failed: 1313
Acceptance Rate: 72.528%
Success Rate: 4.786%
========== REPAIR DOWNLOAD ===
Successful: 0
Failed: 0
Success Rate: 0.000%
========== REPAIR UPLOAD =====
Successful: 0
Failed: 0
Success Rate: 0.000%

It’s the upload stat I’m concerned about. It looks like I’m accepting a bit low but the more concerning stat is the failed attempts which is only showing a 4.7% success rate. That can’t be good.

I did a network speed test straight from the node and I get the following:

PING 5ms
DOWNLOAD 112.70 Mbps
UPLOAD 32.58 Mbps

Thoughts @Alexey?

bodiaSBK · July 27, 2019, 4:37pm

and it’s my result.

uptime 50h, it’s okay?

Lipora · July 27, 2019, 4:43pm

Your’s looks fine. My upload numbers looks like shit LOL My network is in good shape so the results are confusing.

Alexey · July 27, 2019, 5:35pm

You can try to play with that parameter: Should I change max-concurrent-requests? (ERROR piecestore upload rejected, too many requests)

Lipora · July 27, 2019, 5:52pm

I fixed that one. Bumped it to 25. The more concerning stat now is My Success Rate on the uploads. I reset the docker container just to clear the stats and I’ll monitor it over the next few days. All the Rejected Uploads are sitting at 0 with 25 as the limit there. I also installed Netdata and saw that my container needed more RAM. I allocated it to 8GB now and I’m no longer at 4% success on uploads. I’m at 8% on average now. I’ll leave it run for longer so I get come better stats and see after a day or two how it looks.

This is the current after a couple of hours after those fixes:

========== AUDIT =============
Successful: 6
Recoverable failed: 0
Unrecoverable failed: 0
Success Rate Min: 100.000%
Success Rate Max: 100.000%
========== DOWNLOAD ==========
Successful: 26
Failed: 2
Success Rate: 92.857%
========== UPLOAD ============
Successful: 53
Rejected: 0
Failed: 759
Acceptance Rate: 100.000%
Success Rate: 6.527%
========== REPAIR DOWNLOAD ===
Successful: 0
Failed: 0
Success Rate: 0.000%
========== REPAIR UPLOAD =====
Successful: 0
Failed: 0
Success Rate: 0.000%

Lipora · July 27, 2019, 8:16pm

Looking at the networking I setup some stats collection on my firewall about where traffic was coming from or being sent too… I see that the Data being sent to my storage node seems to Originate from Germany… I’m in Canada. That would seem like a good contender for why I would have some much failed uploads. If I’m competing with nodes in Germany to give me the work of course I will fail…

What’s the best way to see what the Satellites are doing with my IP address in terms of why Germany would be the location sending me my data blocks to store?

Firewall logs for traffic on port 28967

Source Destination: nodes.dev.storj.space
Country: DE
Traffic Type: TCP
Port: 28967
Total Ingress: 1.16 GB
Total Egress: 32.6 MB
Total Connection Time: 7.5 hours

Alexey · July 27, 2019, 8:36pm

Our main stress-tester is in the Germany
This is alpha, our customers - testers and developers. Most of the traffic usually coming on business days.

Lipora · July 27, 2019, 9:20pm

Ahh so give it more time to see better results? uptime over these errors?

Alexey · July 27, 2019, 9:55pm

Your node should be vetted on each satellite. To be vetted on said satellite it should successfully pass 100 audits from it. Not vetted node will receive only part of data (5% at the moment) until got vetted.
So, the uptime is a key, and do not lost customers’ data of course.

KernelPanick · August 2, 2019, 3:52pm

with 4% success rate, i would recommend a much lower concurrency setting.