Sustained 200mbps ingress

mine sounds and looks (dashboard graphs) like it is on fire. But I am happy to know that my 6GB vm and firewall can sustain these speeds without any packet drops.

I had a quick look at the changes around node selection as a result of the cache implementation. They seemed rather substantial. This result seems to suggest the subnet filtering may have gotten removed/circumvented by these changes.

That is interesting.

@570RJ how many nodes do you have in the same /24?

The same on my node

from europe-north-1.

1 Like

Just to make sure. Who is seeing more traffic and is running multiple nodes on the same ip? Who has still the same load and running only one node? Any other combinations?

The node selection cache is not showing a higher throughput. It looks like the node selection is currenty not working as expected. We will disable the node selection cache and fix that issue.

2 Likes
DistinctIP       bool          `help:"require distinct IPs when choosing nodes for upload" releaseDefault:"true" devDefault:"false"`

devDefault is false? Is it running as dev?

1 Like

More traffic and running only one node.

2 Likes

That should all be correct. The difference might be randomly select and group vs group first and then select. That makes a difference for the number of chances you have to get selected.

I am running a single node, my traffic looks like this;
Total:


europe-north

Other sats less than a megabit each.

Yes it should definitely be group first and then select.
Worth looking into how SELECT DISTINCT ON is implemented combined with a limit. I would have thought it would group first, then filter, but it could be it just iterates over results, skips duplicates on last_net until it reaches the limit.
Additionally I think a reputable and a new node could be selected on the same subnet. That wouldnā€™t explain this increase though. << EDIT: Ignore this, itā€™s caught elsewhere in the code

EDIT: Other theory, the order by on last_net could be an issue if order by is applied before the limit. Which I think it is. Iā€™ll stop now, itā€™s indeed getting late. :slight_smile:

btw, I run a single node and saw a slight increase, but nothing close to what others are reporting.

Good that we have a feature flag for this one. We will disable it, add some more code to get additional monkit data about how many times we are selecting which node and next release we can try again and collect more data.

Plan B keep it enabled but reduce the number of uploads.

It is getting late. One of these solutions will win short term and then we can figure out how to fix it.

Thank for the quick notice.

5 Likes

Not sure if related, but in this topic it was mentioned that 2 nodes on the same IP saw really different amounts of ingress.

It stopped
image

Same here, all nodes are at <1Mbps ingress. Down from 100Mbps total.

To clarify my numbers, I am running 6 nodes (4 in one location, 2 in another) and Iā€™m seeing only a very minor increase in the amount of traffic in one location (~50% increase) and no noticeable difference in the other.

EDIT: None of my nodes are vetted on this satellite. If we assume Iā€™m getting 5% odds at uploads compared to a vetted node, my increase in traffic would be 1000 if the nodes were vetted.

iā€™m seeing nothing, my node is dead like a tombā€¦

or activity level isā€¦ still getting a download here and an upload thereā€¦ been like that

kinda looks like it was at the strike of midnightā€¦

why would you complain about 200mpbs ingressā€¦ xD

@littleskunk probably stopped the tests

yeah i would assume so, reconfiguration is rarely a quick job on large stuff.
but hey, then i got time to scrub my pool againā€¦ the first scrub looked kinda horribleā€¦ might also considered trying to clean my backplaneā€¦ had yet another drive throw a few read errors which thus far seems always to be related to corrosion.

am getting better at locating my drives physically on the serverā€¦ does require a bit of an eye for itā€¦ but atleast i ordered the HBA connections so it sort of makes sense when looking at the backplane.

funny to see how much my successrates tanked now the network is almost idleā€¦ down to like 71-72%, but i suppose the less activity the further geographically the storagenodes fight for connections, maybeā€¦ but i suppose i should be happy being in the top 1/3 using my old gearā€¦ some of the images of the post pictures of your storagenode thread are really impressive.

also got to thinking that maybe i disregarded one of my 3tb drives due to the backplane, meaning i could had had 5 drives instead of 4ā€¦ tho wouldnā€™t have had a bay to put it in anyways thoā€¦ before i successful clean itā€¦

I wouldnā€™t call it complaining, but it turns out it was actually useful feedback. :+1:

I do miss the traffic now, but Iā€™m sure theyā€™re working on a more solid solution in the mean time. Besides, it gives all the SMR nodes some time for house keeping. :wink:

i would mind a couple of months of 200mbit/s so i could fill up my nodeā€¦
but now im also doing some house keeping, getting a second scrub in because i didnā€™t like the result of the first oneā€¦ is 25000 incorrect chksums bad? xD
2nd scrub is 63% done now and only 5 this time, so hopefully that trend holds.