Make downloads faster by randomizing stripe order when downloading

The network must optimize for good performance for all participants and longevity of the network, not just for performance of a single download.

As such:

It is not when you’re downloading many segments concurrently. My home router gives up with few thousands concurrent connections. We’ve seen threads here showing clients failing with too many concurrent connections, e.g. Uplink: failed to upload enough pieces (needed at least 80 but got 78)

Thankfully, in this case you’re still working with hundreds of connections, so concurrency works; just not at a scale of a single segment.

Another factor is that each connection is also I/O and bandwidth on the node side—even if node fails the race. If you double the number of connections, this means an average node will have double the I/O and double the bandwidth. I/O historically has been already a problem, and I bet you’ll get node operators annoyed if their bandwidth will increase twice without rewards for many more races lost.

Yet another factor is bandwidth. Worst case when nodes have exactly the same speed, you’re effectively doubling the amount of bandwidth consumed to download a segment, because even if you discard data from half of the nodes, before you drop connection you have already received that data. This might lead to slower nodes dropping out of the network altogether.

And, with the current scheme, beyond some number I suspect there are diminishing results in terms of single segment download speed.

Tough luck :person_shrugging: The 6 additional nodes should be covering this case well enough. If not, yes, you just download it again from the satellite. BTW, even AWS S3 requires clients to be prepared to resend the request, because it does happen they’ll just throw a 50x randomly. So you can consider this part of the S3 protocol.

Rare enough that it’s not a problem. The 6 additional entries cover this well enough. At some point it’s just a balance of probability…

Just trade-offs. And it’s not like other providers don’t do them. Storj does have more flexibility in tuning them though, so by being able to do this tuning better they win on performance.

BTW, what Storj could implement is doing some sort of connection selection client-side. So, for example, let’s send the full list of nodes to uplink, and the uplink uses some sort of a local database to track relative performance of nodes to choose the nodes that are most likely to deliver pieces quickly. With the current scale this database wouldn’t be that big. If the scale would grow, you can still do some sort of averaged out statistics e.g. per /24 bucket.