High cancel rate in upload requests

Hi everyone,

I’ve started a new node today and I’ve noticed a high rate of canceled uploads. Is this normal? The output of successrate.sh:

At the beginning I thought my node was not being able to successfully respond to some requests because I set a high value for the BANDWIDTH argument when starting up the docker container. But I’ve decreased the value 2 times and still happening the same. Now is set to 4TB and my upload/download speed is 65/17Mbps. Anybody knows what the problem is?

How is your HDD connected to the PC with storagenode?

Canceled uploads or downloads can be result of cut of long tail or canceled by customers.
Every upload is cut up into 110 erasure encoded pieces of which 29 are needed to recreate the data. Uploads stop after 80 of those pieces have been uploaded successfully. The other 30 nodes who were the slowest for that transfer will see the upload canceled error in their logs.

The same for downloads, but uplink selects 39 nodes for download and stops others when 29 is finished first.

So, your node is slower than competitors.
You can’t improve it with a bandwidth or storage limit. The only viable way to connect your storage directly to the PC and avoid any network connected storage. If you already have it directly connected, then there is nothing you can do. You can’t win all the pieces.

Would you also say that some direct connections are better (faster) than others? USB 3 vs thunder bolt 3 vs SATA?
Also, wouldn’t type of disk being used make some difference? HDD (5400rpm) vs HDD (7200rpm) vs SDD?

From what I remember, its how the HDD is connected and how good it is at random reads.

I have a WD RED 5400rpm connected by USB3.0 and sometimes I got 10% success, lately I got 40% success. However, I also have a WD RED connected by SATA3 and it didn’t get any better rates.

The huge difference between network connected drive and locally connected drive.
USB drives are often shutdown because of low power or overheat of USB controller, but the speed is almost the same.

1 Like

Do all “canceled” messages mean that my node has lost the race or do some of them mean the uplink has aggressively closed the connection (mentioned in Context cancelled on all uploads/downloads )?

1 Like

@Alexey My HDD is connected via USB 3. I had a node running and I had to set up a new one cause I was disqualified due to a power outage problem. Before, my node used to answer successfully to most requests, now it rarely succeeds. My set up is:

Another reason I can think of is that the nodes in the network have improved massively and my node has fallen to be a slow one in the list. Is that possible?

Mostly whether it’s local or networked. That has by far the most impact. Sata vs USB3 is negligible. As for the HDD itself, since this is uploads, it’s the random write performance that matters. SSDs would be better at that for sure and SSD cache can also help, but it’s almost never worth the small improvement you’ll see.

The annoying part for people suffering from this is that it’s mostly your location and distance to the customer that’s in the way. So if you’re not using remote storage over a network protocol, there isn’t much you can do. So start lobbying local businesses to use Tardigrade and your node will be first in line for their traffic. :wink:

5 Likes

The amount of traffic on the storj network would also greatly affect success rates… if the network is fairly busy, whoever has time to try to upload can do so at a relative slow pace, however if every node is sitting waiting for work, the competition of meeting the demand i assume would be much greater…

ofc lot of that kind of stuff is down to the nuances of storj programming which i am clueless about.

anyways it sounds to me like a change in the network stress or programming has affected some nodes greatly…

my numbers look fine… but my node is barely vetted yet… so
slightly above avg from what i read on another thread

ofc the spinning down of disks as Alexy mentions, can be a huge factor and a pretty sporadic one to put it mildly. because if you have slow traffic continually over a month so the drive never spins down, your response time would be great…

and then a month with little continual, but high intermittent peak demands, could make a drive spin down in the mean time… and spin up time for a mechanical hdd is like a 1000ms

ofc in a case of power management, upload and download successrates should correlate… i would assume