IP Filtering, Deep Dive

aeonaura · August 29, 2019, 1:31am

In this post it goes over IP Filtering.

With today’s v0.14.3 release, we’ve implemented a feature called IP filtering, which will ensure that no file pieces corresponding to the same file are stored in the same geographical area, based on logical subnets.

Wondering how this is in effect. Knowing and promoting the Storj Network to many people in my community within the same ISP which happens to be within the same IP subnet (/8-through-/24) at times.

What is defined as “logical subnets?” /8, /16, /24, /29?
Is the IP filtering taking in the effect of each Node within a /24 would be filtered together?

Do you have an internal way to show geolocation from the Docker/Node coding and is this based of the Public IP?

Totally understand you need to place safeguards to prevent like is mention in the post, City wide outage.
Want to fully understand how this is going to work!

Thanks all!

Alexey · August 29, 2019, 6:26am

The goal is not storing the pieces of the same segment in the one physical place.
We don’t have a best filter at the moment, so we uses IP mask /24 in the node selecting process. From this filter chosen two nodes and the best one (regarding reputation) from them.
This is means that all nodes from the mask /24 are considered as only the one node.

This filter doesn’t applicable to the audits.

aeonaura · August 29, 2019, 6:56am

Time to stop promoting in my community…

Is there a away to check if nodes are being filtered?
Able to be placed in a new /24 pool.

Thanks for the quick response!

KernelPanick · August 29, 2019, 3:45pm

storjnet.info could help provide this.

Return all server address values
resolve DNS to IP addresses
filter only and sort by IP
create filter to detect duplicate IPs on the first 3 octet (x.x.x.y)
display any nodes resolving to those duplicate prefixes

aeonaura · August 30, 2019, 11:30pm

That’s a great idea!

BlackDuck · August 31, 2019, 7:34pm

There is a misunderstud of filtering. The filter/selector just ensures that one piece is only storef once in one /24 segment. The node in this segment will selected random. So every node will get data.

aeonaura · September 1, 2019, 3:49am

Right, if only one node gets selected in a giving /24, the top reputation node will win.

Making it much harder for the other neighboring nodes in the same /24 to get data. Not sure how the reputation reacts when you receive data. Imagine you drop a “received data” stat that builds back up in time.

BlackDuck · September 1, 2019, 11:25am

The selection is pure random on a segment!

KernelPanick · September 1, 2019, 6:35pm

Here’s the way I currently understand it, could be wrong though.

Node A and B are on the Same /24.

Node A wins the random selection for a piece. Node A does not win the race with the other nodes, and is context cancelled before finishing.

Node B would have won the race if it would have been selected first, because it performs better than node A.

Node A is less likely than a unfiltered node to be selected for pieces because it shares the same selection with Node B. Node A will also receive less pieces than node B, because it performs worse.

Node B will get more pieces than node A because it performs better, but still gets much less opportunities to compete with an unfiltered node.

I would suppose that having an unfiltered node is much more important than having a superior performing node. Unless traffic reaches a point where it’s bottle-necking the node.

BlackDuck · September 1, 2019, 7:32pm

That is how it work right now.

The filtering must be done, if not the reliability is compromised.

You could put some more things in your calculation. When are you really in same /24 with others? Most time in Datacenter. This nodes will be win the race more often than users with a home line.

aeonaura · October 10, 2019, 2:14am

Pondering this IP filtering problem. Think I have a fair solutions for it.

Keep the /24 IP filtering. Keep the current reputation selection process with a slight alteration.

Say there is only one node in the /24 the alteration would take place but have no effect.

Example:
If there is 2+ nodes the alteration starts to work.

If the highest reputation is selected first.

If two or more nodes have the highest obtainable reputation then comes the alteration.

Node 1: Rep 5000/5000 | Alteration 100/100
Node 2: Rep 5000/5000 | Alteration 100/100
Node 3: Rep 300/5000 | Alteration 0/0 (only gain alteration once X (90th percentile of top reputation))

Let’s say Node 1 wins the first data shard, this can be based on time the node as been active. The Alteration now drops by X (example: 50).

New data shard request.
Node 1: Rep 5000/5000 | Alteration 50/100
Node 2: Rep 5000/5000 | Alteration 100/100

Now Node 2 gets the shard.

Because Node 1 was eligible, but wasn’t high enough Alteration, they gain 10

This would build a round robin of sorts to shard data for the highest node reputation with out choking the opportunity completely to other nodes.

Would think alteration would only take place if two+ nodes were near max reputation AND in the same /24.

Basically let the best node win, if they tie, share the wealth.

Pentium100 · February 10, 2020, 9:08pm

In the other topic, @Vadim mentioned that he has a few nodes in the same /24 subnet and the total amount of data of about 25TB.

So, it would seem that multiple nodes in the same /24 are not completely aggregated into one big node and running multiple nodes results in more data.
I would like clarification on that. Is it only for nodes with different IPs (same /24) or would that apply to nodes with the same IP?
I have free space on my server, could spin up a few more VMs…

Vadim · February 10, 2020, 9:13pm

I also mensiond that it took lot of time to get this data there.

donald.m.motsinger · February 10, 2020, 9:18pm

As I understood, he has a total of 25TB available, of which 1/3 is filled.

I guess he got this much data because he has this much space, bandwidth and CPU power available. He didn’t get more as he had gotten with a single 25TB node.

Vadim · February 10, 2020, 9:25pm

I have 300/300 connection and also i have about 85-92% ingress success.

KernelPanick · February 10, 2020, 9:47pm

I have 900/900 with similar ingress success on a Single node. Its not even sweating on the hardware it’s on, so it does seem more nodes in a single /24 get more data. I have about 4TB, while you estimate around 8TB. I am assuming that I’m the only user in my /24 though. And I could be wrong. So once that’s the case. The race is on again to get more data from your neighbors.

Vadim · February 10, 2020, 9:48pm

how long do you run it?

Pentium100 · February 10, 2020, 9:49pm

I have very high success percentage as well, but only 4.6TB.
My node has never run out of space (it has 8TB virtual disk now and Storj has assured me multiple times that the amount of free space on a node does not matter when choosing a node to store a piece, as long as that piece fits - it was a bit different with v2).
1/3 out of 25TB is 8.3TB, still a lot.
As for a long time, I have been running my node since the end of March.
My connection is 1000/600.
@Vadim also mentioned that he got 150mbps upload average in January. I got 22mbps.

So, it looks like the “supernode” gets more data and more traffic than a single node.
I woner if this is due to the way the IP filtering works or just due to inefficiencies in the node software that make running multiple copies work better.

That would mean having mode nodes than your neighbor. If you have 10 nodes and the neighbor has 2, you’ll get 5 times more data then him.

KernelPanick · February 10, 2020, 9:50pm

I’ve had a single node since about a month after v3 started sending invites

Vadim · February 10, 2020, 9:52pm

there is some point, sattelite choose 110 nodes for each file upload, if you have more nodes, you have more chanses to be in this 110 nodes. As far as i know agrigation is made to not get more than 1 piece of ONE file go to one node.