Updates on Test Data

I may be misunderstanding a few things, but doesn’t the /24 rule take care of spreading over the network? As far as I understand it, it just focuses on the fastest nodes after spreading it out over different /24s. If a node is fast, it makes sense to get more data. A paying customer isn’t going to wait on a raspberrypi with an SD card (as the node’s storage) for his/her/its upload, imho.

1 Like

If a popular node fails, the impact on the network (increased risk of lost segments, repair costs) is bigger. It’s in the interest of Storj to minimize this risk by spreading pieces as equally as possible over nodes (or /24 blocks)—potentially trading off some performance.

Besides, a large number of /24 might be located at a single ISP. If such an ISP is selected often because it’s “close” to uplink, then again the risk is high due to a potential correlated failure. The /24 rule does not prevent this on its own, sure, but combining it with a speed metric makes the risk bigger.

Could we perhaps tweak the network to select the fastest nodes for the client to upload, then spread it over the rest of the network slowly? (depending on the customer’s needs ofcourse)

1 Like

That seems like a lot of wasted bandwidth. I think it’s just a balancing game. If nodes gets data apportioned to their capabilities, that should lead to still wide distribution and also optimize performance. It’s just not at all easy to accomplish that. I could think of ways to calculated that, but it would be too slow for node selection. So the things @littleskunk is testing are just fast approximations of that. Which I think is the right way to do it. Just takes some finetuning.

Edit: To clarify with 16/20/30/38 RS settings the ideal situation is that every node loses 21% of races. 8 out of 38 pieces. This means the load is equally spread based on capability. If a node loses less than 21% it should be selected more often, because that means other nodes lose more than 21% of races. And it means performance is left on the table, by overloading slower nodes and not utilizing faster nodes to the best of their capabilities. The best node selection system is the one that balances this percentage to equal out over all nodes. But also, the real world is messy and some really slow nodes will kind of always drag things down. So it’s more complicated than just this. Some nodes may never achieve 21% lost races and will always lose more.

3 Likes

As I understand it from talking with LittleSkunk, less optimal nodes will still get data at a rate they can handle.

A lot of bits are being sent to a lot of nodes. As a node gets too much traffic, it loses races, so the Satellite selects them less often but still often enough to utilize them more optimally.

This is better for slower nodes because you would still get about the same data you would normally, but you won’t use as much bandwidth because you won’t lose as many races.

In this way, all the nodes are utilized to the best of their ability.

Think of it this way. A fast node might take 10 uploads at once. Where as a slow node might take 3. If you sent 12 transfers to the fast node, it wouldn’t be able to handle 2 of them. The slow node takes those 2. Or, the slow node could.take 3 of the 12, and the fast could take 9. In the end, everyone is getting data because the volume of transfers should keep every node busy with the optimal.number of transfers they can handle.

2 Likes

@littleskunk posts suggest that the fast nodes have just so much spare bandwidth or iops capacity that they never (or negligibly rarely) get to the level to be equal to slower nodes.

And they shouldn’t be. But even the fastest nodes can only take so many connections at once. And slower nodes shouldn’t be trying to serve transfers if they are already overloaded.

2 Likes

I hope this is the correct part of the graph for those tests. The times are in GMT+3

It did not affect the iowait much though:

The CPU usage went up a bit, but not too bad:

How long my node took to process the requests (too bad the logs do not show fractions of seconds anymore):
graph_image-2

Concurrent requests (I do not know why the spikes happen, if they are real or some inaccuracy):
graph_image-2

Success rate went down a bit:

But still, at peak, my node was doing ~140 requests per second:

@littleskunk I get that the speeds are awsome with the new node picking models, but what about pieces distribition?
Are pices well distributed globally? Or concentrated in small geo areas?
I hope you guys up there don’t forget about this aspect.

1 Like

this is really very thick humor

I know, sorry about this. Here we uses this word for some reason to describe a weak devices. I do not like this, however somehow it self-explanatory.

Back to the topic. Yes, some routers are made to be as cheap as possible, of course with some caveats, like not enough resources to process a big NAT table, thus they just starting to drop packets, but usually it doesn’t help.
I have had a deal with DLink routers, they were need to be rebooted every 18 hours to work normally… also they were very sensitive to a power supply (0.1V difference could made them crazy, this was especially “fun” that it was their native power supply…).

Thus, if you run a big amount of nodes, you need to use a something like pfsense. I do not want to discuss a Mikrotik, however, I have had dozen tickets from users, who cannot finish a Graceful Exit when it transferred pieces to other nodes because of a high failure rate and when they connected directly to the ISP Ethernet all problems magically gone… So, I have some weird feeling about Mikrotik since then, see also: Uplink: failed to upload enough pieces (needed at least 80 but got 78) - #5 by andy01).

1 Like

I had bad opinion about cheap routers from a long time ago when I needed to reboot the router once or more per day because it could not handle torrent connections even on a 4mbps ADSL line.

I am using servers/PCs as routers since then. Pfsense is good. So far I had no problems with Mikrotiks and we have some of them installed for various clients (some uses are rather demanding too). OTOH, for high usage I prefer a Linux router (just iptables etc, no GUI) and use a couple at home with failover. I have not tried to run a Storj node with Mikrotik, so I cannot say how suitable they are for this particular use case.

The node uses a lot of connections, here’s the counts from the conntrack table of my router

     12 CLOSE
    795 ESTABLISHED
      2 LAST_ACK
    565 TIME_WAIT

I have no other evidences about a Mikrotik routers except these tickets and the linked thread (perhaps depends on a firmware or something else), however, I have a strong feeling that there is something wrong when the number of a parallel connections is too high (how high, by the way to make it drop packets?).
No much complains here on the forum too. So, maybe a misconfiguration, however, the example discussed here only gives more fuel to my feelings.

Ah… the good old days! When BitTorrent was new and could knock over pretty much any router ISPs were giving to their customers. So many support calls! :wink:

My Unifi router seems to be handling it, although it did break a bit of a sweat. :wink:

I have mikrotik and it working very fast, 300 Mbit now trafic is just 9-20% of cpu usage. It is not home router. All home routers are not made for real loads, only for usual home users.
For this kind of loads you need at least premium or pro grade routers, then experience will be good.

2 Likes

It probably also depends on the hardware used. Mikrotik uses same software for very different hardware and maybe some of the older/slower/cheaper devices cannot handle too many connections. It’s great the Mikrotik allows old devices to be updated to the newest software and that all their devices use the same software (I don’t like SwOS, so I pretend it does not exist), but same software does not mean same capabilities.

I have a Mikrotik RB433, probably made around 2008 (originally it came with RouterOS v3) that can run the newest version of RouterOS. However, with 64MB RAM it probably wouldn’t be able to handle too many connections. A brand new hEX lite also has 64MB RAM, but has a 840MHz CPU instead of 300MHz of the RB433. I use it as a 100M switch.
The CRS series are great switches, but while you can use them as routers you really shouldn’t, they have weak CPUs.

3 Likes

8spimv

2 Likes

When does it rain?

I was told Storj was “Gonna make it rain!” :money_mouth_face:

1 Like

In next month’s payout :rofl:

2 Likes