We can compare the number of unique wallet addresses to the number of unique /24 blocks. Taking numbers from storjnet.info: 3805 payments for 11457 subnets. These numbers have the following drawbacks:
- While wallet address is not 1-1 tied to a single operator (some operators may use multiple addresses against T&C), at least it gives a lower bound on the number of operators, as we can probably reasonably assume many operators do not share a single wallet.
- Number of wallet addresses is itself a lower bound, not all addresses are paid each month.
- Number of /24 blocks on storjnet.info is likely also an underestimate.
Storj, on the other hand, can (1) look at the true numbers fixing the second and third drawback above, (2) actually do a proper model of wallet addresses distribution over /24 blocks, (3) take into account reliability parameters (like long-term connectivity statistics) when modeling, (4) take choice-of-n method into account, (5) manually adjust for outliers like Th3van. A proper model is probably a 100-200 LoC script in STAN or a similar tool. I assume Storj has done so, knowing what kind of math was put into the whitepaper. If not, frankly speaking I would love to be contracted to do this—I used to do this for living.
While I acknowledge this, a nice side effect of the choice-of-n method taking latency into account might actually balance this. I am really curious to what degree.
I would say dispersing over SNOs is probably even more important than geographically. However I suspect it’s actually difficult for a single operator to get a significant number of locations cheaply, and the locations that would indeed be cheap (like Oracle cloud) are easy to find out through AS numbers. So we can probably substitute “SNO dispertion” by something easier to measure with the same effect.
As to the specifics of the node selection algorithm, the last time I checked (over a year ago) extending the code to simply pick nodes that would have multiple unique characteristics (like /24, AS, wallet, geoIP, geolatency, etc.) wouldn’t be a big problem. The only questions would be whether there is a good enough supply of performant nodes across these characteristics.