IP Filtering, Deep Dive

In this post it goes over IP Filtering.

With today’s v0.14.3 release, we’ve implemented a feature called IP filtering, which will ensure that no file pieces corresponding to the same file are stored in the same geographical area, based on logical subnets.

Wondering how this is in effect. Knowing and promoting the Storj Network to many people in my community within the same ISP which happens to be within the same IP subnet (/8-through-/24) at times.

What is defined as “logical subnets?” /8, /16, /24, /29?
Is the IP filtering taking in the effect of each Node within a /24 would be filtered together?

Do you have an internal way to show geolocation from the Docker/Node coding and is this based of the Public IP?

Totally understand you need to place safeguards to prevent like is mention in the post, City wide outage.
Want to fully understand how this is going to work!

Thanks all!

The goal is not storing the pieces of the same segment in the one physical place.
We don’t have a best filter at the moment, so we uses IP mask /24 in the node selecting process. From this filter chosen two nodes and the best one (regarding reputation) from them.
This is means that all nodes from the mask /24 are considered as only the one node.

This filter doesn’t applicable to the audits.

Time to stop promoting in my community…

Is there a away to check if nodes are being filtered?
Able to be placed in a new /24 pool.

Thanks for the quick response!

storjnet.info could help provide this.

  1. Return all server address values
  2. resolve DNS to IP addresses
  3. filter only and sort by IP
  4. create filter to detect duplicate IPs on the first 3 octet (x.x.x.y)
  5. display any nodes resolving to those duplicate prefixes
1 Like

That’s a great idea!

There is a misunderstud of filtering. The filter/selector just ensures that one piece is only storef once in one /24 segment. The node in this segment will selected random. So every node will get data.

Right, if only one node gets selected in a giving /24, the top reputation node will win.

Making it much harder for the other neighboring nodes in the same /24 to get data. Not sure how the reputation reacts when you receive data. Imagine you drop a “received data” stat that builds back up in time.

The selection is pure random on a segment!

Here’s the way I currently understand it, could be wrong though.

Node A and B are on the Same /24.

Node A wins the random selection for a piece. Node A does not win the race with the other nodes, and is context cancelled before finishing.

Node B would have won the race if it would have been selected first, because it performs better than node A.

Node A is less likely than a unfiltered node to be selected for pieces because it shares the same selection with Node B. Node A will also receive less pieces than node B, because it performs worse.

Node B will get more pieces than node A because it performs better, but still gets much less opportunities to compete with an unfiltered node.

I would suppose that having an unfiltered node is much more important than having a superior performing node. Unless traffic reaches a point where it’s bottle-necking the node.

That is how it work right now.

The filtering must be done, if not the reliability is compromised.

You could put some more things in your calculation. When are you really in same /24 with others? Most time in Datacenter. This nodes will be win the race more often than users with a home line.

1 Like

Pondering this IP filtering problem. Think I have a fair solutions for it.

Keep the /24 IP filtering. Keep the current reputation selection process with a slight alteration.

Say there is only one node in the /24 the alteration would take place but have no effect.

Example:
If there is 2+ nodes the alteration starts to work.

If the highest reputation is selected first.

If two or more nodes have the highest obtainable reputation then comes the alteration.

Node 1: Rep 5000/5000 | Alteration 100/100
Node 2: Rep 5000/5000 | Alteration 100/100
Node 3: Rep 300/5000 | Alteration 0/0 (only gain alteration once X (90th percentile of top reputation))

Let’s say Node 1 wins the first data shard, this can be based on time the node as been active. The Alteration now drops by X (example: 50).

New data shard request.
Node 1: Rep 5000/5000 | Alteration 50/100
Node 2: Rep 5000/5000 | Alteration 100/100

Now Node 2 gets the shard.

Because Node 1 was eligible, but wasn’t high enough Alteration, they gain 10

This would build a round robin of sorts to shard data for the highest node reputation with out choking the opportunity completely to other nodes.

Would think alteration would only take place if two+ nodes were near max reputation AND in the same /24.

Basically let the best node win, if they tie, share the wealth.

In the other topic, @Vadim mentioned that he has a few nodes in the same /24 subnet and the total amount of data of about 25TB.

So, it would seem that multiple nodes in the same /24 are not completely aggregated into one big node and running multiple nodes results in more data.
I would like clarification on that. Is it only for nodes with different IPs (same /24) or would that apply to nodes with the same IP?
I have free space on my server, could spin up a few more VMs…

I also mensiond that it took lot of time to get this data there.

As I understood, he has a total of 25TB available, of which 1/3 is filled.

I guess he got this much data because he has this much space, bandwidth and CPU power available. He didn’t get more as he had gotten with a single 25TB node.

I have 300/300 connection and also i have about 85-92% ingress success.

I have 900/900 with similar ingress success on a Single node. Its not even sweating on the hardware it’s on, so it does seem more nodes in a single /24 get more data. I have about 4TB, while you estimate around 8TB. I am assuming that I’m the only user in my /24 though. And I could be wrong. So once that’s the case. The race is on again to get more data from your neighbors. :stuck_out_tongue:

how long do you run it?

I have very high success percentage as well, but only 4.6TB.
My node has never run out of space (it has 8TB virtual disk now and Storj has assured me multiple times that the amount of free space on a node does not matter when choosing a node to store a piece, as long as that piece fits - it was a bit different with v2).
1/3 out of 25TB is 8.3TB, still a lot.
As for a long time, I have been running my node since the end of March.
My connection is 1000/600.
@Vadim also mentioned that he got 150mbps upload average in January. I got 22mbps.

So, it looks like the “supernode” gets more data and more traffic than a single node.
I woner if this is due to the way the IP filtering works or just due to inefficiencies in the node software that make running multiple copies work better.

That would mean having mode nodes than your neighbor. If you have 10 nodes and the neighbor has 2, you’ll get 5 times more data then him.

I’ve had a single node since about a month after v3 started sending invites

there is some point, sattelite choose 110 nodes for each file upload, if you have more nodes, you have more chanses to be in this 110 nodes. As far as i know agrigation is made to not get more than 1 piece of ONE file go to one node.