Centralization of traffic via Gateway-MT, as connection is no longer from client but via gateway

So I thought I would post an interesting observation of my nodes usage over the past 3 months, and reach out to Storj on this behaviour to get their view.

In all of this post we are ignoring repair traffic and Dev load data, as it skews everything.

The issue we are focusing on is "The successful storage nodes to store the segments from the list generated by the satellite, will be selected by the customer who has the best connection, and performing node - you will get cancelled early / late drop of client connection if your node doesn’t respond quick enough "

TL;DR - 1 ) My node it no longer see the above as true, with Gateway-MT - it is strongly biased towards nodes being geo-located closer to these Gateway-MT servers from the list of nodes returned by the satellite. The connection is no longer from client to SNO, but instead client from anywhere in world, to a few Gateway-MT servers located in highly performant, but centralised datacentres, and then from the Gateway-MT you will get the connection to the SNO. This will bias the SNO’s who are geographically, and ISP connection wise closer to the Gateway-MT servers impacting the geographical distribution of nodes, that are on more latent links are can’t respond as quickly.

  1. It’s now impossible to geo-block countries on my firewall for uploading data to my node, as it now mostly all appears from the pool of Gateway-MT servers - How will Storj deal with local country compliance of laws when nodes can’t filter themselves ?

  2. Gateway-MT is clearly really popular (good), but for me I love Storj for the decentralized nature, but we now have a highly utilised, centralized single points of failure introduced, that can impact on node selection, and piece distribution. S3 compatibility for me is definitely more of a backup / long term storage option, as apposed to low latency transactional access - how long before we have a Beta of Gateways that can be hosted by SNO’s ?

/end TL;DR

More info

The information I’m quoting if from my own node experience, which I accept might not be typical for the network, and I accept there are mistakes in assumptions and my English so sorry for that - my setup is very basic, but it manages to work and I can see from forum there are far more advanced setup’s out there, definitely at enterprise scale.

background (not complete)

All my node traffic is pre-processed and logged by pfSense firewall cluster, and sent to my dedicated logstash cluster to process (not just for storj )

I’ve got some logstash rules on there to enhance the metadata around IP address, and state with advanced GeoIP among other bits.

I’ve also got my Storj node docker logs integrated into Logstash using a docker shim and filebeats.

The output is sent into an Elastic cluster, and visualized on my Kibana server.

The purpose of all the log processing was to allow for some work I am doing on IDS, to allow for dynamic firewall control around block/allow and also around packet shaping for traffic based on a more intelligent rules engine (not relevant, but info on why I’ve bothered )

One of the side effects of the above project, is I’m able to with a high degree of accuracy map a client connection to a Storj nodes action, be it put / get / delete and further more map it to the satellite and segment ID, and ultimately the file location.

#cut

So the purpose of post was to say 3 months ago, my cancel / fail rate was 0.1-0.9% over a 1 week period

My traffic was predominantly from;

  • uplink clients, or self hosted S3 gateway from all over the world ~ 70% traffic
  • transfer.sh ~ 10% traffic
  • gateway-MT ~ 20% traffic

Now, Aug 21 this traffic has changed really excitingly, but also with consequences ;

My traffic is now ;

  • uplink clients, or self hosted S3 gateway from all over the world ~ 30% traffic
  • transfer.sh ~ 5% traffic
  • gateway-MT ~ 65% traffic

It is great to see Gateway-MT being so popular, but the usage of a node for S3 data which seem to be very much cold storage isn’t great :slight_smile:

  • failed / cancelled uploads
  • big segment uploads from a small pool of IP’s that we can’t control who is client
  • Huge deletes of data after 7/14/30 days so assuming backup jobs
  • No / limited Egress traffic as there is no need to read the data as it’s probably backups
  • Very full Trash cans of GB’s of data, that we aren’t compensated for when the jobs deleted.
  • Excessive disk Iops from the move to trash code >< I get it, but when we are talking GB’s into the trash, that is a write / read that wears on the disk not to mention the fragmentation.

Discuss :smiley:

CP

Where did you get that statement from? Afaik Node selection is completely random but since not all nodes are needed, the fastest ones win which are of course the closest ones. So the closest nodes within that random selection are used but there is not geo-targeted selection of nodes.

(Sorry I didn’t read the long rest of your post though…)

Good data!

My first feeling about the problem is, while S3 as a protocol is a second-class citizen in the Storj world (requiring a gateway, as opposed to directly connecting to nodes), it is going to be the most used approach by everyone except those who need all the performance.

Yet I think there’s potential risk in a long-term horizon that rational nodes with worse connectivity to these central points of distribution will simply filter at a firewall level any connection attempts from them.

TBH, I was already thinking of extending the storage node code so that it would store the number of successful and failed attempts to store data from each IP… let say, from each /24 block ;-), and if the ratio between these two values crosses some level, drop the connection before any data gets sent.

The gateway may also be used as a stepping stone by many customers. It’s easy to change a bit of code in your application to switch to another S3 compatible provider, but a lot more of an investment to rebuild code to with with libuplink. Even the customers that intend to do that long term probably start out by using the gateway. In the end this has to be a customer choice though. For many the compromises of the gateway will be worth it. But the long term benefits of direct connections will definitely matter to others.

What’s the use of this? Even if only a small amount of data ends up on your node, that’s still an amount you can make money on. Are your connections being saturated? HDD struggling to keep up or something? If not, I really don’t see the point of messing with any transfers.

2 Likes

Yes, apparently that’s what customers are choosing despite the downsides. But I think we should give them some more time to adopt their projects to the native implementation. They may still switch, but it will require a little more time investment.

Imagine you could identify an IP range where all uploads coming from that range had an accept rate of 1%. So for each 1GB of ingress, only 10MB is effectively stored. Does it make then sense to even try accepting that data? You’d be effectively wasting bandwidth that could be used to more effectively serve other uploads.

This is, for now, a theoretical situation, but if Storj becomes few orders of magnitude larger, I believe this might be a rational choice for some node operators.

I don’t think so, as the number of nodes scale with that growth. It’s unlikely that your node will get significantly more traffic. So yeah, if there is not bottleneck, I don’t care if it costs a bit more bandwidth to store data from far away IP blocks. I’ll take what I can get.

Again, imagine Storj being few orders of magnitude larger. Then most storage will sit in proper data centers with good connectivity to Gateway-MT or other centralized points, because at scale it will be worth paying for data centers. The small players can’t compete on latency or speed with data centers, so will only have chance at winning races within, let say, close geographical area. Maybe within a single ISP, maybe within a large city.

1 Like

the payment for cold storage is still fine, so the more customer data the better imo.

personally i haven’t seen any issues with successrates dropping, mine hover in the 99% range most of the time… generally it depends on my storage activity, if i limit the activity i get 99.9% on almost everything except uploads because i force them directly to storage before sync acknowledgements is done.
zfs using zpool set sync=always, sort of similar to turning off your hdd’s cache in windows, but not quite.

if i set my sync=standard then my upload successrates also go up to 99.9 while they are usually more like 99.7% max again depending on storage activity.

so far as i can tell there has been no chance in my successrates since i finished fine tuning my setup, so i think it’s unlikely that it’s gateways fault… or i’m in a favorable geolocation.

we have in the past tested ingress and each node running in optimal conditions would get the exact same ingress… but it was a while since we lasted tested it… but it would be fairly easy to retest.

@CutieePie you sure you storage isn’t running into iops limitations.?

That can only be based on a complete misunderstanding of both how the network works and what the goals of the network are. Nodes are selected at random, so even if half of the nodes are in data centers with super high speed, you would still get plenty of data. But it is no ones goal to have data centers as nodes. Storj is optimized to work with low powered nodes that do no require the massive expense of a data center.

It’s not what the network is built for and even if it would happen it wouldn’t significantly impact small node operators. This is all hypothetical fears based on false premises.

Nice one! Your node(s) supposedly want to live so long :slight_smile:

1 Like

how big is your node?

generally about 400 iops (1 hdd) worth should be plenty, or it is for me atleast… but i do have a SLOG SSD.

the SLOG SSD for ZFS is really important as without SLOG ZFS will double the iops due to the ZIL.
one can also move the database to an SSD but with a L2ARC or plenty of RAM ZFS will.

also the newer nodes seem to have more files on the same storage, so newer nodes will see more iops that older nodes… not sure if that is a trend or just how it happened to be for now…

does make sense to put the worst workloads on the newer nodes, so if they drop out the network will already have gotten the advantage or if people create many nodes they will need a lot of hardware.

stuff like an SMR hdd could have problems, as uploads are the majority of data for a long time after making a new node, it does sound odd that your system by default wouldn’t be able to keep up…

but maybe move your databases to an ssd, i think that was the go to solution for reducing storj iops, or l2arc if your pool is zfs

With logs and DBs on an SSD, the spikes on my HDD are less than 100 iops :smiley:
logs and DBs are the worst part and should really be put on an ssd (best mirror ssd but ssds are really cheap, you just need 2x64GB).
The file walker at the start of the node is a pain in the … though as it causes very high iowait but after that, piece of cake.

1 Like

Goals are nice to have, but not all can be achieved.

My claim here is: if Storj grows few orders of magnitude, there won’t be enough spare capacity in home nodes, and hence I expect a significant majority of the nodes with still free space to be hosted in data centers. These will outcompete the few home nodes still wanting to fill their free space.

Doesn’t have to be a goal, but I think this will happen with the current incentive structure.

Great, that means data center costs will also be lower.

This is a wrong assumption. Just because I only advertize 4 TB to the network doesn’t mean I’d not be willing to buy another 10TB HDD when it makes sense. So the network size can easily grow to more than twice the size within a few weeks. We’ve seen that with chia and the demand in storj space will never grow that quickly.

2 Likes

yeah it’s just problematic to really gauge it in iops, the higher one gets in iops the more the disk latency spikes.
using in the upper ranges of iops can cause spikes of backlogs up to multiple seconds.

also keep in mind not all iops are equal, a disk maybe able to do 1200 sequential read iops, but maybe 1/10th if all the blocks being accessed are located far apart, or if one has high random io.
also these are a generalization, but shouldn’t be far off… for most consumer/ enterprise sata hdd’s

i can without a doubt clearly see my successrates go up and down… had a time when some vm’s ran out of memory and was swapping, which caused my successrates to drop down 70%

most of the time tho, the drop isn’t more than a few % down from 99.9%
i would expect most to see that if running under optimal conditions, ofc many things could create bottle necks, by personally it’s always my disks workloads.

Perhaps not, but they do determine what Storj Labs is likely to do. And they will definitely mean that running nodes in data centers is a last resort kind of thing.

You provide absolutely no evidence to support the statement that home nodes won’t have enough space. While anything that Storj Labs has seen during both V2 and V3 networks shows that when the usage goes up, so does the available storage space. The upside of high usage is that running a node becomes more lucrative and will automatically attract new node operators and incentivize older node operators to expand. There is plenty of unused space around the world the serve massive amounts of customers. It’s just a matter of getting that space added to the network. And so far there has never been a shortage of node operators. Keep in mind that Storj Labs also has tools like surge payouts to provide an incentive for potential node operators to give it a try. When nodes fill up, many SNOs (including myself) will just expend their nodes as well.

I think my previous message already specified that they will be unlikely to outcompete home nodes in any significant way. There is no such incentive structure that would benefit data center nodes enough to compensate for the considerable expenses of running nodes in a data center.

No, it doesn’t. Nobody builds a data center without storage redundancy, power redundancy and connection redundancies. Data centers require significant investments in all these redundancies as well as cooling systems, power failover, high speed connections etc. etc. Yes, you could in theory skip all that for Storj nodes, but then you are putting all your eggs in one basket and building a single purpose data center, which is just a super bad investment. Because if you already plan to run something at the scale of a data center, you could make much more money by just offering services directly to customers. You wouldn’t have to share profits with Storj Labs either. And I didn’t even mention having to pay for the building/location itself…

Furthermore, I fully expect the incentive structure to evolve in such a way that it will be profitable to share free space you already had, but not profitable if you have to buy additional equipment for it. Ideally it would be barely profitable to cover for the purchase of HDD’s and HDD’s alone. So if you have to buy a computer for it (other than maybe a raspberry pi) forget about ever getting a return on that. If you have to keep a server online for Storj alone (again with the possible exception of raspberry pi if power is cheap)… probably not worth it. You’ll make your money on HDD purchases back in say 2-3 years and that’s it. Try building a data center with that incentive structure… just can’t be done. There may be some larger setups that do a kind of ghetto setup with racks full of raspberry pi or super low powered servers with lots of HDD expansion cards etc. I mean, we’ve all seen the warehouses with massive amounts of playstations used for mining. People are going to try shit on a budget. But they would be stupid to pay for peering with high speed interconnects, because that would simply not be worth it. And so your node would still be on the same level as those setups for winning piece races. And with piece allocation being random, you still get your fair share. These setups can’t possibly suddenly outscale any home nodes, because that would quickly drop profitability for them to 0 as well.

The reasons the goals for the network matter, is because they are codified in the many design decisions made while building the network. And as a result, the setups that were never targeted to begin with, don’t make sense to run in the network they ended up with.

1 Like

While I don’t expect Storj to host nodes themselves in data centers, node operators will do that and already do.

Chia itself has shown that a significant growth of storage needs broke supply for home customers, but not for professional data centers. They can respond to changes in demand faster and that’s where my reasoning comes from.

Have you seen all the cardboard-based bitcoin “data centers”? Well… have you seen the almost-cardboard-based OVH “data center” that burned recently? People already have done that as soon as the demand is there.

That’s not the case now and that will not be the case if Storj needs to grow. While I do expect changes in incentive structure, if Storj wants to reach even just mere exabytes, this is not going to happen on just storage people happen to have already, because it’s not going to be enough to answer demand peaks.

Besides, I wonder how many current nodes would disappear if the incentive changed against them.

Regarding the topic itself, consider the following. Hetzner has close to 9k /24 blocks (source). Within their pricing structure you will see that it is possible to rent servers with between 64-224 TB of non-redundant disk space at 1.375 USD/TB/month — assume that’s their operating cost after all the wages, hardware cost, power usage, cooling, etc. Assume 5% egress monthly, which is what we’re already observing, making each TB stored bring 1.5 USD + 1 USD in Storj payments. This means that Hetzner would earn 1.125 USD/TB/month hosting nodes on their own hardware. If they chose to host 9k nodes exposing 224 TB disk space each, all of them on a separate /24 block, that’s 2 exabytes profiting (not just revenue) 2MUSD monthly. This is not even specializing the hardware to Storj needs, just using what they offer off-the-shelf.

They’re not doing it because Storj is way too small for that yet. And Hetzner’s far from the largest data center operators out there.

That’s the numbers I base my beliefs on.

Well that explains, because that is not a realistic comparison at all. Chia had no limits to growth and started out being completely unbalanced. You could add whatever space you wanted, whenever you wanted. With Storj you will have to wait until you actually get data and until you are vetted. New nodes will not impact existing nodes at all and if new nodes get added en masse suddenly, it will take ages for them to be vetted. They would be completely unprofitable and those nodes would likely leave before they ever impact existing nodes. Even if they do make it out of vetting, the ingress is now spread over many more nodes and while existing nodes have existing data to make money off, those new nodes can only make money on new data and get a super small piece of the pie each.

Storj is starting out balanced and Storj Labs is putting effort into ensuring the growth is balanced as well. Furthermore, market effects will ensure this balance is maintained too. Customers start storing more data and running a node becomes more profitable and attracts more SNOs. More SNOs come in, nodes get less profitable and that stops the inflow of new nodes. It sorts itself out. Additionally, profitability does not rely on token value. So sudden spikes in value won’t lead to massive influx of node operators.

Chia is so completely different that this can be easily rejected as valid evidence. Past experience with the Storj networks is a much more reliable prediction and shows the balance I was referring to.

You clearly didn’t read my entire post before responding, as I referred to ghetto setups later on. They won’t have much of an impact on your nodes performance as their existence relies on similar cost factors as home setups would. They can only survive as long as they don’t have higher costs than my home NAS which was already always on. And I don’t really see how they can do that with single purpose hardware. Let them bring it… I’m sure they’ll drop out before I will as I’m sure they will have higher cost per TB than I have.
But referring to the OVH data center is out of place here, since that was a data center with all the required redundancies and overhead costs that I listed before (though they could have invested in better fire prevention). Hosting nodes there would definitely not be profitable long term.

It is actually. With current network performance, if you buy a 16TB HDD now and run a node on it, it takes 2 years to break even. 8TB, about a year and a half. That’s not taking into account power costs of running the setup or any other hardware. This is already very close to only be profitable if you run existing hardware or run the most low power hardware possible. It definitely won’t cover renting/buying a building any time soon. And keep in mind we are being overpaid with the current payout structure. Storj Labs is making a loss on every byte stored and downloaded atm.

We can make this all very complicated, but it’s a self balancing system and the intended target for SNOs has basically no costs at all. I started out with running a node on existing hardware with no upfront investment. I measured that the difference between running a node on the system and not is about 5w. Good luck trying to beat the cost of only 5w in any data center. Later on I started adding HDD’s. Ideally, that would still be profitable over many years. I don’t really care whether it would be profitable for Hetzner right now. The fact is that it won’t be long before it’s no longer profitable for me. Their pricing is also based on the assumption that the majority of space will sit there unused. So you can’t just calculate the per TB price and assume they make a profit on it if you use it all, because a massive amount of profit is simply because most customers don’t even use half.

They don’t do it because there will never be enough slack to both scale this up and remain profitable. Wanting scale works against them significantly here, because the more scale they add the less profitable it will be. Backblaze recently investigated whether Chia would be profitable for them and arrived at a similar conclusion. The self balancing nature of such systems would end up having them compete against much cheaper setups and it would not work out for them long term. Storj has a lot more systems in place to temper getting off balance, like no token fluctuations and never starting out with a massive unbalance. As a result, it will likely never even reach a situation where a data center could jump in to even make temporary profit. It doesn’t help that even if a sudden inflow in customers makes things more profitable, that vetting and collecting data in the first few months will prevent new nodes from even benefitting from that and by the time they are vetted, there are many more to take their piece of the pie. There is just no opening to ever really get in at such scale.

And in the mean time all my nodes revenue is still profit… so they can never beat my ROI anyway.