Using multiple IPs on the same physical service

arrogantrabbit · September 10, 2024, 1:40am

Design of the network is already Byzantine fault tolerant and stable when most node operators act in self interest. The /24 rule is just a tweak on top. And it only needs to apply to majority, not all, usecases, to be effective.

kotapuzis · September 10, 2024, 2:41am

The /24 rule was created to stop operators from creating too many IP addresses inside their own /24 subnet (and thus cheating the system). Now we have a problem with some operators cheating entire /24 subnets.

Since when did this start? Since Storj came up with the first rule to split traffic equally between addresses. We added a rule => we got cheaters.

Operators would have no desire to use any extra addresses if they weren’t encouraged to do so by a dubious rule.

Alexey · September 10, 2024, 4:46am

Not between addresses, we need them uncorrelated to any common factors as much as possible. To do not have too much pieces of the same segment unavailable in case of failure any of these common factors, like the same server or the same power grid or the same ISP, etc.
The first rule was not allowing to run more than one node per server, but only by ToS. The goal was the same - to protect the network from correlated failures, otherwise it cannot deflect the Sybil attack.
/24 rule was the first one technical rule, implemented in the default node selection, but the second one in this regard, and it’s amended the first one by now allowing to run more than a one node on the same server, but only in the same /24 subnet of public IPs. This helped to distribute pieces better across different locations, increasing the durability of the network.
“Choice of n” rule just extended the /24 rule and allowed to select also the fastest node for the customers’ location using the success score, with a side effect of reducing the risk to have correlated failures with the Byzantine behavior of some Operators who were smart enough to bypass the /24 rule and still have a profit even after reducing payouts, because their solutions likely have a less success rate over the time, protecting from having more pieces of the same segment in the same location or controlled by one Operator.

kotapuzis · September 10, 2024, 2:53pm

The problem arises, when you want to uncorrelate something meaningful to the network (you want to avoid common downtimes), but you do this by irrelevant factors (preferring different IPs).

Imagine, we have a node operator, who have two ISP connections at home. Each connection is not so fast, and not so reliable. But the node operator can use two connections together, to get nearly 100% uptime and twice bandwidth most of the time. The problem is — he is not allowed to. His server is “in the same physical location”, and his intention to make network better is “altering default network behavior”. That’s regulation at its worst: the operator have to cheat the system only to make it good.

I believe that technical preference of different IPs can increase network reliability to some extent. But writing strict restrictions into the Terms & Conditions leads to the opposite effect: it attracts cheaters (who have nothing to lose) and discourages honest users (who don’t want to get banned for their useful contribution).

Toyoo · September 10, 2024, 11:30pm

It’s not irrelevant. It’s correlated with the desired factor, making it a proxy of the true, but difficult to measure variable. When you have a choice to not measure something, or measure at least some proxy, the latter still seems better.

Consider what would happen without the /24 rule. Node operators would set up thousands of nodes to get as much traffic as possible. The rule does not kill all bad behavior, but makes it quite a bit more difficult—hopefully to the degree where it’s no longer a concern. We don’t need perfect rules, just good enough.

There’s this single human who threatened to kill off open internet access for a whole country—coincidentally the one with the biggest number of storage nodes. What kind of a natural disaster would be able to do this?

JWvdV · September 12, 2024, 6:28pm

That might be true for de earthquakes Perhaps there should be a rule in the ToS that the housing each single node is in, should be anti-fragile
As the saying goes, to govern is to predict. Which means also for Storj, to do a risk estimation every step along the way and try to reduce risks by taking measures.
In the end you might end up micro-managing. But spreading segments geographically and over different SNOs, isn’t micro for a distributed storage: it’s their core.

Agree and if /24-rule is already enough, than it’s enough. But I’m not do convinced. I doubt whether it really is a proxy for the thinks you want to separate. For example, me and my neighbor have the same ISP. Only the last 50m of the glass fiber are different, the remainder is the same for almost everything: power, disaster risk, etc. But we’re in totally different subnets, like x.x.x.x/3 different. My father in law and my brother are living about 60 miles away from each other. They have the same ISP and are in the same /20-network. I myself had that same ISP till previous year, living 20 miles from my brother I was in the /24-subnet with my father-in-law (50 miles from here).

When I signed up for Oracle Cloud to overcome CG-NAT I set up three servers, which al were in another /24-subnet. But all located in the same area, according to Oracle.

Indeed, also if you start 20 nodes in one subnet even when using one IP, they are equally weighted as far as I can see. So, if you’re in a subnet with a big ‘concurrent’ operator (not necessarily amount of data, but just node count) you got bad luck. Or you can take advantage of it by firing up multiple nodes in one subnet.

But then, the AS-number would make more sense in my opinion. Look my internet provider for example: AS50266 Odido Netherlands B.V. details - IPinfo.io it has many ip-segments, of which people can even be neighbors having a different IP-address.

Actually, I’m really asking why they choose the /24 rule. I’m thinking of convenience and easy implementation. But I doubt whether it really is a good proxy for correlated factors.

That’s what I doubt. Would really like to see an underpinning of this statement.

What if we made the likelihood of choice of any receiving node from an operator, just directly proportional to the square of the total size of used space in the network: so P([choice of certain node])=SUM([used space of all nodes with same wallet-address])^1.05/SIGMA(SUM([size of nodes by wallet-address])^1.05). It would be an incentive for node operators to identify themselves with the same wallet-address over the whole network. Of course it may be also a any other function with increasing growth.

Yeah sure, the man who also had over 60% of the network…

A giant meteorite, to start with. Same probability. A giant earthquake, double probability. A flood, killing all employees of Storj on the same conference. …

Edit: functions should be off increasing growth instead of decreasing growth.

Mitsos · September 12, 2024, 6:51pm

FYI: Best Current Operational Practice for Operators: IPv6 prefix assignment for end-users - persistent vs non-persistent, and what size to choose — RIPE Network Coordination Centre

Someone mentioned that the “/24” rule for IPv6 is /64, so according to the actual real world guidelines (ie what every ISP in Europe is currently implementing), that means that each SNO can end up with 256 different-ipv6-nodes in his own house without the network doing anything about it.

TL;DR: Since storj doesn’t want to do AS filtering (as is the correct real world use) for nodes, at least try to get reasonable filters in place.

Toyoo · September 13, 2024, 12:20am

What do you mean by size? How do you measure it and prevent nodes from self-reported lying?

JWvdV · September 13, 2024, 5:05am

Used space as reported by the satellite, so nothing to ly about. It also incorporated experience, performance and loyalty to the network (experienced, loyal operators with more and performant nodes have in the end larger nodes). It’s also an incentive to identify yourself in the whole network with the same credentials (wallet). And lowers the significance of IP-address.

Alexey · September 13, 2024, 7:10am

Yes, but we are not there yet. Also, using the AS as a filter maybe too much, subnets maybe in different houses and has a separate power supply. In that case /24 should be sufficient. Right now only multiple VPNs nodes on the same server without a high availability is an issue.

How it would affect a new nodes with almost 0 usage? They would almost not get any data to be selected more often. Looks like a chicken and egg problem.

andrew2.hart · September 13, 2024, 8:55am

Maybe it is time for Know Your Customer, for SNOs

kocoten1992 · September 13, 2024, 9:15am

“Know Your Supplier” you mean? They sort of already doing that with commercial node (the one without /24). But that also push them back a bit because not enough risk aversion among SNOs. The only solution I saw is keep the goal alignment between the company and SNOs, because of the beauty of statistic, enough people will do exactly as the company predict.

P/s: if both /24 and KYC - then forget it, not enough SNOs or they just don’t interested at all because of minimum payout and have to do KYC.

P/s: Or worse - KYC cheating, now we back to square 1…

Toyoo · September 13, 2024, 11:32am

Ok, and what do you mean by sigma?

JWvdV · September 13, 2024, 3:23pm

Well, it was just an example. You just should have an incentive to stay long, with a lot of data and identified as one SNO with only one parameter.

So, it might also be (SUM([used space of all nodes with same wallet-address])+1)^(1.1) or ```0.1+[used space of all nodes with same wallet-address]*ln([used space of all nodes with same wallet-address]). Or any other way to correct for the continuity problem.

See: plot of y=0.1+x*ln(x+150)/5 and y=(x+1)^1.05 and y=x for x=0 to 1000 - Wolfram|Alpha (including one linear formula for comparison)

JWvdV · September 13, 2024, 4:45pm

Won’t happen with my idea, because your ingress doesn’t depend on the amount of nodes anymore. But only the used space, you have already, and if you also favor bigger SNO’s a bit. They are also likely to identify themselves across the whole network with the same credentials (wallet in this case, but mail address could also do it). Because in that case you get more ingress for 2TB than 2x1TB. And it also eliminates the benefit of having multiple IPs.

JWvdV · September 13, 2024, 4:49pm

Total of all nodes, but will write the idea down in SQL if I have some time.

Table: nodes [nodeID, wallet, full, usageInBytes]

SELECT
	TOP 320
	ROW_NUMBER() OVER (ORDER BY RAND() PARTITION BY Ns.wallet) AS FirstRow,
	Ns.*
FROM
	(
		SELECT
			TOP 320
			wallet,
			(0.5 + SUM(usageInBytes) / 1E12)^1.05 * RAND() AS weightTimesRandom
		FROM
			nodes
		GROUP BY
			wallet
		HAVING
			(SUM(full=0) > 0)
		ORDER BY
			1 DESC
	) Wallets
INNER JOIN
	nodes Ns
ON
	(
		(Ns.wallet = Wallets.wallet)
	AND
		(Ns.full = 0)
	)
ORDER BY
	1

Got the flaw corrected that it should be een increasing growing function, instead of a decreasing growing function.
Since SQRT(1)+SQRT(1) = 2 [2 SNO’s with 1TB], while SQRT(1+1) = 1.41 [1 SNO with 2TB]

kotapuzis · September 14, 2024, 1:22am

Great idea. Once we check everyone’s IDs, we’ll finally know what gender our SNOs are. And then we’ll be able to distribute traffic fairly, so that men get the same summary amount of traffic as women.

Alexey · September 14, 2024, 4:59am

We prefer smaller SNOs because they are less correlated and gives more data distribution - very helpful for CDN-like usage.
Big SNOs always trying to concentrate most of data and disbalancing the distribution and reducing a durability.

Or do you mean those big SNOs which control multiple locations? And/or have several 300Gbps lines with highly available setups in a DC? If so, they can be used, but still not preferred, unless it’s a customers’ requirement (then welcome to the Storj Select).

JWvdV · September 14, 2024, 8:26pm

You say so, but we don’t do so in the first place. Because best-of-n favors those with most optimized connection and most bandwidth. Guess who they are.

Second, they also are the ones most successfully downloading it. So they are favored twice by this policy.

They aren’t necessarily less correlated, you know less about their correlation. Because for smaller SNOs, it’s easier to cheat. Moreover, it’s really questionable whether smaller nodes attribute significantly to durable distribution. They are also more likely to fail and have less to stay in the network for. So, essentially the question is: can you better have 2000 SNOs with on average 100TB (having had at least 4 years to build this amount) or better have 20000 SNOs with an average 10TB (which can be the influx of less than half a year these days). With RS-nummers of 29/80, it might not be an important question. Not if you try to lower it to fit example 29/45 it will be. I doubt whether any model will favor more short staying SNOs over less longer staying nodes (aside from less repair traffic).

So to be fair: I really doubt whether you favor small SNOs, and whether you should favor them.

Alexey · September 15, 2024, 3:04am

The whole point of implementing rules to distribute pieces more randomly is preferring the independent nodes (which you called small SNOs, so there could be a confusion, which is not necessarily the same. We do not selects SNOs, we selects their nodes).

The second is better, because pieces are distributed more widely, even if each is not so robust like the first ones. Also, for CDN usage the second is preferred too. From any point of view having 20,000 small independent nodes is way better than 2,000 highly concentrated.
We have the comparison exactly for what you are saying - the Storj Select has much less nodes than the Storj Public, and we forced to implement a different node selection rules to avoid highly concentrated data on small subset of highly performant providers in the Storj Select.
In many cases, Storj Public outperforms these high-end nodes in Storj Select, and it is only customer requirements that force them to use Storj Select over Storj Public.
I also have a feeling, that since the amount of nodes is smaller, the risk to lose data is much higher due to a mass update for example in the network of the one of the providers, which has managed to get most of the data - it will be down almost in the same time. So, our engineers do their best to do not allow this scenario.
And now you are suggesting to do the same for Storj Public. It would immediately become a second Storj Select with all caveats. So, no, thank you.
If you have a high quality setups, you may obtain SOC2 certification (or analogue) and join Storj Select.