Muliple nodes, a total of 16 nodes on one IP

In any setup and arrangement there will always be a small number of participants abusing the system. They will always be there and their number will always be small.

Maybe the technical solution here is not to try to enforce 100% honesty, but instead tolerate small percentage of cheaters? For example, increase redundancy slightly.

I’m sure storj can ballpark number of suspiciously correlated nodes, I don’t think it will be high enough to justify wasting engineering time squeezing these exponentially difficult to achieve last drops of compliance.

The Operator can run different nodes in different places, so we cannot assume that all these nodes are in one physical location or on the same hardware.
It is not our goal to pay less to multi-node operators, we want customer data to be secure.

1 Like

It is possible to detect this automatically and mark as a one subnet, then this group of nodes will not get more than a one piece from the segment. Thanks!

So all operators behind a single CGNAT ISP that happened to experience outage will now be lumped together as one node?

Only CGNAT ISP?
It will be funny if one of Germany largest providers Vodafone has one of their frequent outages and 10s or 100s of nodes go offline at the same time. :stuck_out_tongue:

1 Like

I run 8 machines, each in it’s own subnet and in different physical locations, but in a small geographycal area. The internet outages and blackouts are pretty common and they take out my nodes in groups, or all at once. But in rare cases, only 1.
So to distiguish me from an abuser, there should be an history on 3-6 months maybe, that will track the outages. If in 6 months some nodes of the same operator go dark in the same time, and never go dark independetly, than you can assume that they are in the same location or in a pretty small geo area, that it’s equivalent to same physical location.
In the end, the physical location dosen’t matter at all, just the time beeing online and offline. If some nodes spread across the globe go dark in the same time ALWAYS, than, for Storj point of view as data avaiability for customers, those nodes can be threated as one node.

Other way of identifying a location whoud be MAC address of the router and machine, but I don’t know if it is practical.

I mean, that’s why I said to use multiple signals. Coordinated downtime might be a strong predictor, but ISP maintenance can cause this as well. As can regional power outages. It might be worth using data that is quite obviously pointing to people running multiple nodes to train statistical models to find other such signals.

Unfortunately no single signal is a black and white indicator.

Well, writing a script to randomly shut down nodes for random intervals is not that hard.

This is likely the path forward, but it will required massive number of nodes, a lot of data collecting, expertise in machine learning, and a lot of training efforts that are a distraction really, all to weed out 1% of cheaters. Which ultimately, will be impossible.

An example I think of in this context is actually me: I have one node in CA, and another node in OR, at my brother’s house. But he is behind CGNAT, so I’ve setup the vps on Oracle to route traffic.

On my node I don’t want to expose my IP address nor mess with DDNS, so I route traffic to my node also through another VPS on Oracle. This allow me to do some interesting things – like LTE failover when my IPS has an outage, without messing with DNS and routing: node just reaches out to my VPS from whatever connection available. It’s super handy.

But in this case these nodes look indistinguishable – same number of hops, same (very close) latency, same datacenter, IP addresses in different /24 segment even. I could have been a jerk and run second node in my house using different VPS from the same network and there would be no way to tell. No amount of meta analysis would have detected this because it’s absolutely symmetric situation.

Point being, you can’t eliminate 100% of cheaters, so I vote for incorporating some “shrinkage” in the nodes diversity assumptions and the efforts to be spent elsewhere.

Yes. The /24 subnet limit is used for the same - a one big ISP, who can shutdown the part of the network should not get more than a one piece of the same segment.
So, I do not see an issue here. Subnet grouping may have a learn curve as for audits or suspension scores, i.e. it can have a weight between 0 and 1, and it can be used as a percentage of the strictness. But I guess it would work more like an online score calculation, i.e. depends on some time window for example.

So, in general I offer to measure a correlation between nodes. If connect AI to the process, it would be even more precise, and spike downtimes wouldn’t affect the “correlation score” too much.

1 Like

Adversaries could simulate short downtimes within allowed ranges on single nodes to make the signal difficult to extract.

Not in the case of the learn curve as with audit/suspension score.
This will be accounted, and resulted to something like 0.8, which is too close to consider them as a one subnet.

However, I believe, if such measurement would take a place, there would be no incentive to circumvent it… At least from an investments perspective…

To solve the problem, that two pieces should not end up on a same machine, I just would filter it out by email account or payment address, as was proposed here. It makes so negligible difference (if any) in ingress even to a large operator, that there won’t be incentive to work around that. Advantage in preventing file loss is huge.

Not only the same location is problem IMO, also more pieces should not end up under a same operator. I remember somebody stopping all his 30 nodes since he didn’t have time for that anymore, few months back…

1 Like

We doesn’t have accounts for SNOs, email address could be omitted in the node’s config. This email is used only for notifications.
Perhaps you are right about too heavy correlation between nodes of the same Operator, but I do not like the idea to limit by the wallet address, it will force them to use a different wallet for each node. But it will have a couple of disadvantages though, they would need to manage multiple addresses to collect the total payout, it also will increase time before the first payout, unless they would use zkSync.
However this also will be used as an exploit: spin-up as much nodes as possible with a different wallet addresses, because each node will be treated as uncorrelated in case of filter by wallet.

My thoughts are that the redundancy reasons for multiple nodes on the one segment doesn’t stack up on a global scale. If storj was really serious on redundancy and spreading data pieces, then I’m sure they would pay more for nodes in locations that actually offer redundancy than the same as someone stacking up nodes in one of the highly node populated Eu countries. The location map of nodes is a glaring falsehood of node redundancy and risk reduction. I can see Putin lobbing a bomb into the power grid of one of those highly node populated cities and storj customers so impacted they have no alternative than to leave and go else where knowing it could happen again. Ukraine, 9th in the node lists, thats got to be risky. I doubt putting nodes on different segments is really the issue there or there about.

I think if that happens we have greater problems to worry about than cloud storage of files. It would be considered a force majeure event in any case.

1 Like

Storj network is designed to circumvent these force major cases. So they say.

Even if they manage to combat the vps abuse, I think this dosen’t solve the centralisation problem. The race condition buit in the Storj network, by storing pieces on the fastest responding nodes, is the main cause. Maybe I’m wrong, don’t realy know how the nodes are choosen by sats.

This is exactly how it works with subnets now. They simply select a set of subnets and then pick a random node within the subnet for each piece. This results in nodes within a subnet sharing ingress like we see now. Thus a major impact. Unless you have a fast way for satellites to somehow ensure enough nodes and deduplicate on email address that doesn’t result in sharing ingress evenly within those clusters, doing this for email would result in the same thing.

Ok. Something a little less dramatic, like Russia blocking EU internet sites. A 5 minute job to block all the Storj Satellites being accessed in Russia. That would take a few thousand nodes off line. Likely get a few back with a VPN but would loose a lot and then there is the lower response via the VPN.

Storj has already taken precautions against Russia blocking internet services to the rest of the world.

1 Like