Continuous piece download attempt saturating bandwidth

Mitsos · March 30, 2024, 4:00pm

iperf. default settings, 2MB file, over wifi, server is running 9 nodes and currently getting 1 more rsynced to it (so not even ideal conditions).

Alexey · March 30, 2024, 4:04pm

oops. good luck with that.

Toyoo · March 30, 2024, 4:05pm

I was trying to raise this issue in the past. I believe that overloading a small number of nodes that contain pieces of the same very popular file in extreme cases may lead to cascading failure, where the file becomes unavailable for customers.

First, the slowest node gets hit by too many requests. Assuming equal distribution of bandwidth across requests, this means that even if the node was fast enough to serve a small number of requests, it will fail most, if not all requests when there’s too many of them, because all requests will be served at a similar slow speed. Then, this load will spread to other nodes, which would on a regular basis serve just an equal fraction of traffic, and will now have to serve more of it—again potentially leading to overwhelming them as well. In effect, despite that there are nodes that could have been serving traffic, most of their bandwidth is spent on inevitably canceled downloads.

Obviously as a storage node operator I would much prefer to handle only downloads that have a high chance of succeeding. This is difficult to predict though, so in the post linked above I suggested tracking available bandwidth and not commiting to serve a download until the node is sure it can do so at a decent speed.

Right now we seem to have plenty of nodes with very high bandwidth for this scenario to be very unlikely, so I don’t think it’s any urgent to do anything. On average, even low bandwidth nodes can serve most traffic, and in the peaks (or when nodes are running maintenance procedures like the file walkers) we still seem to have a lot of high-bandwidth nodes. But when the need arises, I suspect this suggestion might be cheaper to achieve than having more pieces for popular files.

Mitsos · March 30, 2024, 4:12pm

Not trying to break any speed records, just trying to prove that you don’t need 2MB/s (=16Mbps) connection to transfer one 2MB file just to register on a graph. It’s clearly shown there, it transferred it as fast as the current wifi connection could allow (~150Mbps). I could get a 1Gbps connection saturated for a fraction of a second, but not too bothered to go find a patch cable just to do that.

deathlessdd · March 30, 2024, 4:57pm

Id like to compare can you share your command you used?

Mitsos · March 30, 2024, 5:03pm

On server:

iperf -s

On client:

iperf -c (your_server_ip)  -p 5001 -n 2M

Make sure port 5001 is open on the server for it. Both running Debian Linux of course with “iperf” package installed.

deathlessdd · March 30, 2024, 5:14pm

Thank you I was getting crazy numbers

Roxor · March 30, 2024, 5:24pm

If a client gets a list of 80 nodes it can contact to retrieve any particular piece of data… how do you ever have “a small number of nodes” experiencing overloading? (aren’t there at least 50 nodes for any data right now (if we look at the Healthy Pieces (min) values)?

But yeah… even with 50-80 nodes involved… one will always be “the slowest” for a particular chunk of ones and zeroes at a particular point in time. It should vary based on geography and the data requested… but some people just have poor Internet connections and slow HDDs

So to simplify this, you’re saying “Slow nodes will lose more races than faster nodes… and will see a higher number of cancelled downloads”? And yet… the paying client got their data sooner from the fastest nodes, and those faster nodes get more coins from more egress. That would be as-designed and as-expected, wouldn’t it: race winners get more?

As a storage node operator you only have control over the general quality and speed of your Internet connection, and the speed and latency of your storage. Those things determine if you win races: whether Satellites are tracking bandwidth samples over time or not.

If there was tracking… slower nodes would no longer be “maybe-slow so can still try for some data”… they would be “confirmed-slow so don’t use them at all”. Who wins then? And you can’t configure some sort of welfare system for slower nodes… without making clients wait longer for their data.

Serving and storing data must be a race.

deathlessdd · March 30, 2024, 5:40pm

Accepted connection from 192.168.2.152, port 49838
[  5] local 192.168.2.189 port 5201 connected to 192.168.2.152 port 49839
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-0.08   sec  1.86 MBytes   201 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-0.08   sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-0.08   sec  1.86 MBytes   201 Mbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201

So im trying to test on with 10gig nic and it will not go much higher then this.

Changed to 20MB and it went up a bit higher

Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.2.152, port 50268
[  5] local 192.168.2.189 port 5201 connected to 192.168.2.152 port 50269
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-0.58   sec  19.9 MBytes   285 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-0.58   sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-0.58   sec  19.9 MBytes   285 Mbits/sec                  receiver

I cant really get consistent numbers

Another server on 10gig

Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.2.189, port 57232
[  5] local 192.168.2.21 port 5201 connected to 192.168.2.189 port 57233
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-0.07   sec  1.81 MBytes   229 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-0.07   sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-0.07   sec  1.81 MBytes   229 Mbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

[  5] local 192.168.2.21 port 5201 connected to 192.168.2.189 port 57333
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-0.93   sec   200 MBytes  1.81 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-0.93   sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-0.93   sec   200 MBytes  1.81 Gbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

daki82 · March 30, 2024, 7:02pm

mine did 30x! jay! finaly, my 5tb node overtake my 10tb node

Mitsos · March 30, 2024, 8:43pm

Overheads, including MTU. There is a limit to how much data you can transmit if you send (a few) packets down a very fast line. They can only get there so fast. This starts showing up in 10Gbps, as you have seen.

When benchmarking high bandwidth links (ie in my line of work we do up to 1Tbps (yes, tera)) you need multiple packets down that high bandwidth link to get any meaningful data out of it. One way to do that is increase MTU (the packet size), to eliminate TCP overhead (ie handshake, checksums) and do it over UDP, in addition to parallelism (ie send multiple packets down the line). For all intents and purposes, you need multiple hosts to run as servers and many (MANY) more acting as clients.

As the proverbial golden rule goes: In theory, theory and practice are the same. In practice, they are not.

Toyoo · March 31, 2024, 1:00am

Sorry, you have totally missed my point: for a specific node, increasing the number of download requests that node receives past saturation of bandwidth will start decreasing the number of downloads completed, potentially to zero. Yes, maybe the node is too slow to serve all of these download requests, but it still should be able to serve some of them.

Roxor · March 31, 2024, 1:31am

I think I understand now. That’s the case for every SNO on the network, isn’t it (not just slower ones)? Or even every service serving anything on the Internet?

Either a faster node will complete the download request (which happens, from what I can tell, 100% of the time now)… or the never-happened-before-theoretical-overwhelming-load slows all 50-80 nodes serving a particular piece of data… and the client simply experiences a longer completion time. But even in that case… the fastest-of-the-overwhelmed-nodes will win more races.

I think what you’re asking for is for satellites to artificially slow down client transfers… to give known/tested-slower SNOs the opportunity to s l o w l y send data to clients (instead of those clients completing earlier using faster nodes)?

How would Storj market such a feature to paying clients? Maybe:

“We have over 24000 systems protecting your data: and we’ll throttle your transfers to the speed of the slowest ones for only $4/TB/month!”

May need a bit of polish. How would you sell it?

Alexey · March 31, 2024, 4:20am

This could be true only for a particular location. But it can be incredible fast for the customer in the next building. So such tracking look like a waste of time and resources, the network will rebalance itself, especially when some pieces will be recovered and distributed on other nodes in case if the slowing is because of hardware issues (and it could be disqualified due to audit failures because of long timeouts >=5 minutes/piece).

Alexey · March 31, 2024, 4:25am

This is not possible because the traffic goes directly between nodes and clients, and not through the satellite.
But maybe you mean the GatewayMT? In this case it could be possible to implement, but why?

Roxor · March 31, 2024, 11:12am

This isn’t my idea so I’m not that invested in it: but the satellites do tell the clients which nodes to connect to. Imagine if satellites always sent over their 80-110 nodes… at say one node IP every 2 seconds… sorted from slowest to fastest. For large transfers it may take the client a couple minutes to even see the IPs of the fastest SNOs… and in the mean time they’ve been exchanging data with the slow ones. For small transfers they’d complete before even hearing about the fastest options.

The end result would be clients that would eventually still hear about all 80-110 SNOs, so you’d maintain availability guarantees… but their average transfer speeds would be slower because of the artificial satellite bias to tell them about slow nodes first.

I think that would be a horrible idea… and as a client I’d never pay Storj for such a service. But I believe satellites could impact client transfer speeds: simply by delaying those clients from knowing what their faster options are.

You’ll have to scroll up a bit: Toyoo has been trying to explain why to me. In their example link and comment (“increasing the number of download requests that node receives past saturation of bandwidth will start decreasing the number of downloads completed, potentially to zero”) - it sounds like a concern that slow nodes can get into a state where they win few/zero transfer races? So maybe wants Storj to make technical changes so they’d win more in high-traffic situations (at the expense of client speeds, and faster-SNO earnings).

Welfare for slow SNOs?

I think the current “race” system works well, and benefits the clients who are ultimately paying for it all. And if you have a node with poor performance… you should expect to earn less with it.

Toyoo · March 31, 2024, 9:37pm

Yep.

No if the service employs queuing, like a properly designed network service.

No, I’m not asking for that, and again you are not even trying to understand my posts, and instead choose to mock them. So there’s no point in discussing the idea with you, sorry.

Alexey · April 1, 2024, 7:47am

How it may know, which one is faster or slower for that customer location?
If we would implement the requesting to test the speed before start a download or upload in the libuplink, every single upload or download will be slower than now. It’s not what we need.
We just cache nodes, which have been used last time (and successfully provided either space or the piece) for the some time interval. Essentially, this is as close to your proposal as possible.

The satellite provides IPs to the libuplink, not hostnames.
BTW, SNO = Storage Node Operator, I wouldn’t be happy if the satellite could provide me to the customer instead of my nodes…

This is not like it works. The nodes selected randomly, so everyone node has an equal chance to be selected. The only exception if they hosted in the same /24 subnet, in this case only one of them will be selected for every segment of each customer’s upload. But in the end, they all will be also selected, just for different segments of different customers.