Using multiple IPs on the same physical service

Alexey · September 7, 2024, 5:17am

Seems you missing the main thing.
Why did you ever want to use an unique public IP for each node, running on the same server even using the same ISP? Right, to bypass the /24 filter. So, you already know about the filter and already know, that it’s enabled by default on the satellite, it’s part of the default network behavior. Why do you want to bypass it? Right, to artificially increase a reputation of the node, to be selected more often, so again - against the default network behavior, which otherwise will not select the node from the same location.

I quoted the exact line in ToS

5.  Restrictions. You will operate the Storage Node in strict accordance with the terms of this Agreement and in no other manner. Without limiting the generality of the foregoing, you will not:
...
5.1.7. Manipulate or alter the default behavior of the Storage Network to artificially increase or decrease the value of any reputation factor of any Storage Node;

which allows me to say that: changing the default network behavior is not allowed.

The default network behavior includes a prevention to store pieces of the same segment on correlated nodes by limiting nodes in the node selection using the /24 subnets of public IPv4, only online, not disqualified, not suspended and not full. The additional rule called “choice of n” can be applied, it uses the success score between each n otherwise equal nodes to use the one with the highest success score for each piece. The details of implementation you can see in the code on our GitHub. All blueprints you may see there: GitHub - storj/design-docs: Workspace for all Storj design documents

Additional:

introduction of /24 filtering: Announcing Beacon Alpha - File sharing, IP filtering, and increased performance
Choice of n introduction: Updates on Test Data - #313 by littleskunk, Updates on Test Data - #500 by littleskunk
documentation: How to add an additional drive? - Storj Docs

You do not need to argue with me, it will not change the answer. Evading the /24 rule for multiple nodes on the same server is not allowed. It can be relaxed for the highly available setups, but should be explicitly permitted by Storj. Usually it means to conclude a contract, where all conditions will be settled. So, please contact us using the form if you want to participate as a Commercial Node Operator.

JWvdV · September 7, 2024, 9:43am

Indeed, you should be.
Because it broke a Western public rule: freedom of speech, without hurting people without any necessity.

In my opinion it was a sane discussion, on ToS not being aligned with current practice aside from being malwritten and vague.

But I won’t say too much, my edit of the last post in that topic tells my stance towards it. Just like counting the hearts might tell you about the stance of the community, although my cordial appreciation of your knowledge and efforts in this forum.

You’re taking a stance, that’s not juridical in my opinion and you’re implicitly (and sometimes explicitly) forcing us to take the same point of view. That’s the ‘great Leader’ about. But feel free to ban me or whatever you want for it. But giving space to different points of view, might benefit the community and Storj in the end.

I’m just a little bit amused by the way Storj is handling it and a little bit annoyed by the way you’re handling the discussions about it. That’s all

Indeed, since no version keeping the ToS is void now anyway. Because everyone is bound to the agreement at the moment they agree, and any subsequent changes need to be agreed to explicitly.

So, from a juridical point of view, we’re discussing about nothing. No matter your point of view concerning the interpretation of the content of it.

Alexey · September 7, 2024, 9:53am

I agree, however, I’m not in the place to make any legal statements. I use what I have and what’s discussed in many places.
I still think, that making this manageable through the code is much better, than any juridical document, because the code become a ruler.

Sorry about that. I still believe, that having the customers data in safe much more important than any temporary benefits short term from breaking the rules which are not fully supported in the code.
I know, the main goal of the Storj network is to provide an infinite storage with an infinite speed, end-to-end encrypted and practically not destroyable, and to achieve that it also should allow a Byzantine operators to participate in the network and it should handle it properly.
But we recently started to change RS settings (see Updates on Test Data - #281 by littleskunk), and it makes me feel bad if we can lose data to greedy SNOs with lots of hardware that use any trick to get around the /24 filter and when their hardware dies, which it will if they can’t handle the expected load properly.

Toyoo · September 7, 2024, 7:32pm

As I understand it, changes to RS numbers were a conscious decision trading off some nines of reliability for speed. This decision was taken with full knowledge of adversary node operators. The fact that operators do workaround the /24 limit is a known fact to Storj, and Storj will have nobody to blame except themselves if a data loss happens due to adversaries.

Nobody in the Internet operates services under assumption all people are nice.

JWvdV · September 7, 2024, 11:35pm

I see, but you’re the one who was advocating certain things to be in the ToS. Since it’s a legal document, that statement can’t be considered otherwise than legal. Especially, if you’re closing topics in which people take other stances towards the right interpretation of the ToS from a juridical point of view (and you’re apparently not that sorry about, that you’re reopening it again). It would be different if you said, the ToS is meant to say this or that (not saying what it really says). Or just that you think this or that to be a right interpretation of the ToS, since in the end you’re also entitled to your opinion.

Think so too. But that’s a different matter than what the ToS says and whether the ToS should be aligned to current practices. But better refrain from any references to the ToS then? Perhaps even abandon it at all?

Again you’re entitled to your opinion. And, yeah, the world would be a better place if everyone was honest and selfless. But since that’s not the case, you should take that in consideration in every step you take.

Although, did the math once. Even if 10% of SNO’s were using 10IPs for the same hardware, it was an equivalent of 2-3 lost pieces at maximum.

Besides, this also should be managed by the code. So why bother?

Alexey · September 8, 2024, 3:32am

I do not have numbers how many SNOs are bypassing the /24 rule, so cannot say how much pieces could be lost. Just looking on number of nodes and number of subnets, it could be much more than 10%. With reduced RS numbers there is a higher risk to end with an unrecoverable segment, because most of pieces were holding by one server instance without a redundancy.

I expressed my concerns about this situation here on the forum and internally. So, we improved the code and the node selection algorithm. I hope that it should help to distribute pieces more safely with a knowledge of Byzantine behavior before any financial instruments would be applied.

JWvdV · September 8, 2024, 12:54pm

Yeah could be, could be much less too. The amount of topics of people having troubles with setting up port forwarding with VPN are reassuring though: apparently you do need more skills than John from next door. The fact people doing so, probably are skilled more than average means they probably even can manage to keep a node running although having the hardware shared with nodes running on multiple IPs.

But, if we really want to discourage using multiple IPs when unnecessary, we should refine the selection process.

At first essentially I disagree you only want to disperse it geographically, I would say you also want to disperse it over SNOs (so if there’s another more profitable project, they don’t quite at once; leaving Storj with a geographically dispersed gap in the network).
We want SNOs to have as few as possible payout addresses, in order to face lower fees.
We want them to stay as long as possible in the network, in order to keep the repair traffic low.
Ideally you want to fill up most powerful nodes first.
You want people to stay online as much as possible.
You want SNOs to prefer a graceful exit over doing out the network without notice.

Assumptions:

Long staying SNOs and nodes with better performance, gathered more data.

Than I could think as a rating factor on satellite for node selection like:

min(
  (
    [data_in_TB_with_same_wallet]
  *
    [data_in_TB_with_same_wallet_and_ip]
  )^(1/4),
  0.25
)

This might make people more eager to use as few as possible ip-addresses and the same payout addresses for all nodes. Since if you use the same IP and payout address for all data, is just boils down to: min(√[data_in_TB], 0.25).

Then you disperse on different IP and wallet. It prefers longer staying SNO’s over shorter saying SNO’s. Although, 16TB makes you only twice as preferent than 4TB. This method might even make the vetting unnecessary, since it prefers node operators who have already shown to be able to operate a node.

It could even be enhanced by:

  min(
    (
      (
        [data_in_TB_with_same_wallet]
      -
        [data_lost_in_TB_with_same_wallet]
      )
    *
      [data_in_TB_with_same_wallet_and_ip]
    )^(1/4),
    0.25
  )
*
  [average_online_score_with_same_wallet]
* 
  [average_audit_score_with_same_wallet]^6

The lost data are nodes disqualified without graceful exit.

But this might need some book keeping, because some node operators might be tempted to change the payout addresses as soon there are SMART errors or when online score run low for any reason. Or when the node failed, just check another time in with another payout address and no data.

PS: @Alexey you forgot to unmark your answer as the solution. Great job though.

Toyoo · September 8, 2024, 5:37pm

We can compare the number of unique wallet addresses to the number of unique /24 blocks. Taking numbers from storjnet.info: 3805 payments for 11457 subnets. These numbers have the following drawbacks:

While wallet address is not 1-1 tied to a single operator (some operators may use multiple addresses against T&C), at least it gives a lower bound on the number of operators, as we can probably reasonably assume many operators do not share a single wallet.
Number of wallet addresses is itself a lower bound, not all addresses are paid each month.
Number of /24 blocks on storjnet.info is likely also an underestimate.

Storj, on the other hand, can (1) look at the true numbers fixing the second and third drawback above, (2) actually do a proper model of wallet addresses distribution over /24 blocks, (3) take into account reliability parameters (like long-term connectivity statistics) when modeling, (4) take choice-of-n method into account, (5) manually adjust for outliers like Th3van. A proper model is probably a 100-200 LoC script in STAN or a similar tool. I assume Storj has done so, knowing what kind of math was put into the whitepaper. If not, frankly speaking I would love to be contracted to do this—I used to do this for living.

While I acknowledge this, a nice side effect of the choice-of-n method taking latency into account might actually balance this. I am really curious to what degree.

I would say dispersing over SNOs is probably even more important than geographically. However I suspect it’s actually difficult for a single operator to get a significant number of locations cheaply, and the locations that would indeed be cheap (like Oracle cloud) are easy to find out through AS numbers. So we can probably substitute “SNO dispertion” by something easier to measure with the same effect.

As to the specifics of the node selection algorithm, the last time I checked (over a year ago) extending the code to simply pick nodes that would have multiple unique characteristics (like /24, AS, wallet, geoIP, geolatency, etc.) wouldn’t be a big problem. The only questions would be whether there is a good enough supply of performant nodes across these characteristics.

JWvdV · September 8, 2024, 9:08pm

I actually think it’s both, if there’s a natural disaster, war or a meltdown of a nuclear power plant somewhere you actually don’t want to have all too much pieces of a certain file in the same region. Even random distribution might make this happen though, if the number is big enough. So better check/stratify for it. But I think dispersing over SNOs, also makes the use of many IPs for the same SNO less a problem.

Probably too complicated (= slow), and therefore too slow to calculate every time. I was more thinking about a constant factor, being recalculated every so many days which will be used in the query for the node selection.

Don’t think so, since it takes only succes rate into account as a proxy for throughput: meaning something like average(2*latency + filesize/bandwidth). Since latency is quite short in comparison to the time needed to put the file through the wire, essentially latency isn’t that much of a factor.

I myself have some nodes in different locations, of which some are behind GC-NAT and some using ISP-addresses. Using Wireguard only adds 4-50ms to the total uploadtime of a file. That’s quite negligible and won’t therefore probably not influence the success rate that much, unless the bandwidth of the VPN is the limiting factor (which isn’t in my case).

arrogantrabbit · September 8, 2024, 10:40pm

How does it account for people actually having deployed nodes at different locations with the same email and wallet address?

Or, operators with a single setup connected to 18 VPN endpoints they pay $2/month to access?

There might be a way to do some downtime correlation analysis – but again, few days ago I updated all truenas servers I manage, that also run nodes, all at once, and rebooted them at the same time (they replicate to each other, so rebooting all at once is most optimal). Because I do maintenance at the same time on all machines – this makes their downtime highly correlated, and yet, they are independent computers on independent ISPs, geographically dispersed.

I thought a lot about how to detect misuse and I could not come up with any solution, without also almost immediately coming up with ways to work around it.

Toyoo · September 8, 2024, 11:13pm

This might be my bias, but I find adversary humans worse than natural disasters…

The STAN model is not about computing anything per upload; it’s about computing the risk for the network as a whole. It’s the equivalent of the bayesian math from the whitepaper, but taking into account all developments since.

Not true for me. Most files in the network are small, making the filesize/bandwidth component small compared to latency for most attempts. I’ve looked at success rates per file size bucket on my nodes, and it was easy to see my success rate for small files is much worse than for big files. I’ve got a good pipe, I’m just somewhat far from uploaders.

Same operator means a potential for a systematic error across many nodes managed in the same way (a botched TrueNAS update?) and/or just operator deciding to leave Storj altogether. So yes, this does account for this scenario

There is a finite number of $2/month VPN endpoint services, and with enough operators trying this trick these services themselves will become easy to target with AS numbers.

arrogantrabbit · September 8, 2024, 11:25pm

This eliminates all operators behind CGNAT. But maybe this is not a bad thing, I don’t know.

Toyoo · September 8, 2024, 11:31pm

Indeed. Though, the better the network gets at reducing correlated failures, the smaller RS numbers can be, leading to bigger pieces of a single segment going to per-AS buckets.

Alexey · September 9, 2024, 4:36am

Yes I want to know that too. The success rate is not only about a latency, it’s more a success rate for finished uploads, so it is effectively would prefer a more quick nodes. Of course the latency should play a role too.
There is also another factor - some VPSes have a bandwidth limit, so these nodes would be forced to switch back to the own IP and the repair job could start remove pieces from these nodes to other uncorrelated ones.

JWvdV · September 9, 2024, 4:41pm

I think there should be insight in the correlation of these values, not so much for the community but for Storj. It actually shouldn’t be distinct values, but the correlation within.

It doesn’t matter whether it is the same node operator, VPN, cluster, power plant or the same ISP. In all those cases, they share something in common, that is a single point of failure for them all. So a correlation analysis over time on same wallet address, mail address, down times and IP-addresses might be a trick to get in each case as much independent nodes. If also having more TB per wallet and/or ip is being favoured in node selection, people are less likely to cheat on those.
I don’t think of it as a means of selection for banning / suspending nodes, but more as a means to get the data as dispersed as possible with the least shared points of failure.

For sure, but the scale of it is just quite a bit smaller than natural disasters…

Indeed, you can compensate for the lost time due to increased latency by high/increased bandwidth.

Or just make sure, no more than one pieces per segment ends up with one VPN-provider. Won’t eliminate them, but might reduce the amount of uploads. But if we do the same for ISPs and data centers, I think this effect might be attenuated.

Toyoo · September 10, 2024, 12:53am

Yet humans are arrogant and seek all possible loopholes. Nature is more predictable.

Roxor · September 10, 2024, 1:18am

Nature can be gentle. Nature can be kind. Nature can be beautiful.

And nature has the power to absolutely mess you up. It is relentless and eternal. It always wins in the end.

Humans have just learned to stay out of its way… and deflect what they can when they can’t avoid it…

kotapuzis · September 10, 2024, 1:20am

Here’s the elephant in the room. Artificial constraints will force the system to adapt to these artificial constraints instead of optimizing for quality. Each new rule will be a breeding ground for corruption. Constraints only create problems for honest participants. Sneaky ones will hack the rules anyway. And you shouldn’t blame them. More rules => worse network.

Internet is already shock-resistant, and payout is enough stimulus for SNOs in different locations to run nodes on their diversely distributed servers.

The best thing Storj can do — is to stimulate nodes to be fast and reliable. And leave the network to optimize on its own.

The worst thing — is trying to control and micromanage, trying take excessive action to dampen own earthquake anxiety.

Remember the concept of antifragility? Rigid, over-regulated systems break down at the first shock. Free and flexible survive.