Hey all,
I regularly get the error attached. Is it a problem?
Here is the first line.
storj.io/storj/storagenode/piecestore.{*Endpoint).Upload:283
Hey all,
I regularly get the error attached. Is it a problem?
Here is the first line.
storj.io/storj/storagenode/piecestore.{*Endpoint).Upload:283
The upload failed (context canceled) errors arenāt actually errors, itās part of how the system works. Every upload is cut up into 130 erasure encoded pieces of which 29 are needed to recreate the data. Uploads stop after 80 of those pieces have been uploaded successfully. The other 50 nodes who were the slowest for that transfer will see the upload failed error in their logs.
The same for downloads, but the 50 nodes got selected initially and 21 from them got canceled when 29 would be finished first.
We are still in the process of adjusting these figures, so the frequency that you run into these ācontext canceledā notices may vary. However, if you would like to reduce how often you donāt manage to win the race to be among the first nodes to upload the piece, you would need to look into getting a better bandwidth from your ISP.
Please, read other codes from the topic: Error Codes: What they mean and Severity Level [READ FIRST]
hello) i am a new node operator, and today i see ERROR in log file. Can you tell me what itās mean, and what i must do to fix it?
`2019-07-27T12:20:48.559Z INFO piecestore upload started {"Piece ID": "G35E5SFMRHE575ESJJR44G2F6YSR5UWKQNX7BYAS5SG5NAGX3Y2A", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "PUT"}
2019-07-27T12:20:52.141Z INFO piecestore upload started {"Piece ID": "3CO2AWB7AQOIVKBSBZVA34ZBEFY3TN3QIAYA6TY2F45JAVI7LA4A", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "PUT"}
2019-07-27T12:20:52.568Z INFO piecestore upload failed {"Piece ID": "G35E5SFMRHE575ESJJR44G2F6YSR5UWKQNX7BYAS5SG5NAGX3Y2A", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "PUT", "error": "piecestore protocol: rpc error: code = Canceled desc = context canceled", "errorVerbose": "piecestore protocol: rpc error: code = Canceled desc = context canceled\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:238\n\tstorj . io/storj/pkg/pb._Piecestore_Upload_Handler:701\n\tstorj.io/storj/pkg/server.logOnErrorStreamInterceptor:23\n\tgoogle.golang.org/grpc.(*Server).processStreamingRPC:1209\n\tgoogle.golang.org/grpc.(*Server).handleStream:1282\n\tgoogle.golang.org/grpc.(*Server).serveStreams.func1.1:717"}
2019-07-27T12:20:52.568Z ERROR piecestore protocol: rpc error: code = Canceled desc = context canceled
storj.io/storj/storagenode/piecestore.(*Endpoint).Upload:238
storj.io/storj/pkg/pb._Piecestore_Upload_Handler:701
storj.io/storj/pkg/server.logOnErrorStreamInterceptor:23
google.golang.org/grpc.(*Server).processStreamingRPC:1209
google.golang.org/grpc.(*Server).handleStream:1282
google.golang.org/grpc.(*Server).serveStreams.func1.1:717`
Hello @bodiaSBK,
Welcome to the forum!
The upload failed (context canceled) errors arenāt actually errors, itās part of how the system works. Every upload is cut up into 130 erasure encoded pieces of which 29 are needed to recreate the data. Uploads stop after 80 of those pieces have been uploaded successfully. The other 50 nodes who were the slowest for that transfer will see the upload failed error in their logs.
We are still in the process of adjusting these figures, so the frequency that you run into these ācontext canceledā notices may vary. However, if you would like to reduce how often you donāt manage to win the race to be among the first nodes to upload the piece, you would need to look into getting a better bandwidth from your ISP.
Thx, @Alexey, itās make my reputation lower? no better bandwidth) 1gbps, itās only distance i thinkā¦
thx u
No. Just your node will get less pieces from those customers.
The data is distributes closely to the customer.
@Alexey
Well, I realized that, in theory, those who are closer to me will be more willing to upload data.
And as I understand it, the low channel load ( <= 1mb) is connected with the fact that clients must actually use the network for downloading in order for you to receive the task.
Yes, the usage is coming from customers.
In the alpha stage our customers are developers and testers, they performs tests when they needed and how much they needed. There is could be no constantly flowing traffic.
Though it could be a similar behavior in the production too.
@Alexey How can you tell if the volume of these is Excessive? I can go through my logs and I see this quite frequently. Is there something of a stat we might see on SNOBoard as it comes live that will help us see if we have a network problem? In proper due course how frequent would we expect this in the logs?
I donāt think that this information would be on SNOboard. But we will see.
There is no ānormalā count of errors. But you can calculate the percent by Script: Calculate Success Rates for Audit, Download, Upload, Repair
Setup that script. Thatās awesome. Thanks for that. The results donāt look promising LOL
========== AUDIT =============
Successful: 10
Recoverable failed: 0
Unrecoverable failed: 0
Success Rate Min: 100.000%
Success Rate Max: 100.000%
========== DOWNLOAD ==========
Successful: 35
Failed: 2
Success Rate: 94.595%
========== UPLOAD ============
Successful: 66
Rejected: 25
Failed: 1313
Acceptance Rate: 72.528%
Success Rate: 4.786%
========== REPAIR DOWNLOAD ===
Successful: 0
Failed: 0
Success Rate: 0.000%
========== REPAIR UPLOAD =====
Successful: 0
Failed: 0
Success Rate: 0.000%
Itās the upload stat Iām concerned about. It looks like Iām accepting a bit low but the more concerning stat is the failed attempts which is only showing a 4.7% success rate. That canāt be good.
I did a network speed test straight from the node and I get the following:
PING 5ms
DOWNLOAD 112.70 Mbps
UPLOAD 32.58 Mbps
Thoughts @Alexey?
and itās my result.
uptime 50h, itās okay?
Yourās looks fine. My upload numbers looks like shit LOL My network is in good shape so the results are confusing.
You can try to play with that parameter: Should I change max-concurrent-requests? (ERROR piecestore upload rejected, too many requests)
I fixed that one. Bumped it to 25. The more concerning stat now is My Success Rate on the uploads. I reset the docker container just to clear the stats and Iāll monitor it over the next few days. All the Rejected Uploads are sitting at 0 with 25 as the limit there. I also installed Netdata and saw that my container needed more RAM. I allocated it to 8GB now and Iām no longer at 4% success on uploads. Iām at 8% on average now. Iāll leave it run for longer so I get come better stats and see after a day or two how it looks.
This is the current after a couple of hours after those fixes:
========== AUDIT =============
Successful: 6
Recoverable failed: 0
Unrecoverable failed: 0
Success Rate Min: 100.000%
Success Rate Max: 100.000%
========== DOWNLOAD ==========
Successful: 26
Failed: 2
Success Rate: 92.857%
========== UPLOAD ============
Successful: 53
Rejected: 0
Failed: 759
Acceptance Rate: 100.000%
Success Rate: 6.527%
========== REPAIR DOWNLOAD ===
Successful: 0
Failed: 0
Success Rate: 0.000%
========== REPAIR UPLOAD =====
Successful: 0
Failed: 0
Success Rate: 0.000%
Looking at the networking I setup some stats collection on my firewall about where traffic was coming from or being sent too⦠I see that the Data being sent to my storage node seems to Originate from Germany⦠Iām in Canada. That would seem like a good contender for why I would have some much failed uploads. If Iām competing with nodes in Germany to give me the work of course I will failā¦
Whatās the best way to see what the Satellites are doing with my IP address in terms of why Germany would be the location sending me my data blocks to store?
Firewall logs for traffic on port 28967
Source Destination: nodes.dev.storj.space
Country: DE
Traffic Type: TCP
Port: 28967
Total Ingress: 1.16 GB
Total Egress: 32.6 MB
Total Connection Time: 7.5 hours
Our main stress-tester is in the Germany
This is alpha, our customers - testers and developers. Most of the traffic usually coming on business days.
Ahh so give it more time to see better results? uptime over these errors?
Your node should be vetted on each satellite. To be vetted on said satellite it should successfully pass 100 audits from it. Not vetted node will receive only part of data (5% at the moment) until got vetted.
So, the uptime is a key, and do not lost customersā data of course.
with 4% success rate, i would recommend a much lower concurrency setting.