Centralization of traffic via Gateway-MT, as connection is no longer from client but via gateway

So I thought I would post an interesting observation of my nodes usage over the past 3 months, and reach out to Storj on this behaviour to get their view.

In all of this post we are ignoring repair traffic and Dev load data, as it skews everything.

The issue we are focusing on is "The successful storage nodes to store the segments from the list generated by the satellite, will be selected by the customer who has the best connection, and performing node - you will get cancelled early / late drop of client connection if your node doesn’t respond quick enough "

TL;DR - 1 ) My node it no longer see the above as true, with Gateway-MT - it is strongly biased towards nodes being geo-located closer to these Gateway-MT servers from the list of nodes returned by the satellite. The connection is no longer from client to SNO, but instead client from anywhere in world, to a few Gateway-MT servers located in highly performant, but centralised datacentres, and then from the Gateway-MT you will get the connection to the SNO. This will bias the SNO’s who are geographically, and ISP connection wise closer to the Gateway-MT servers impacting the geographical distribution of nodes, that are on more latent links are can’t respond as quickly.

  1. It’s now impossible to geo-block countries on my firewall for uploading data to my node, as it now mostly all appears from the pool of Gateway-MT servers - How will Storj deal with local country compliance of laws when nodes can’t filter themselves ?

  2. Gateway-MT is clearly really popular (good), but for me I love Storj for the decentralized nature, but we now have a highly utilised, centralized single points of failure introduced, that can impact on node selection, and piece distribution. S3 compatibility for me is definitely more of a backup / long term storage option, as apposed to low latency transactional access - how long before we have a Beta of Gateways that can be hosted by SNO’s ?

/end TL;DR

More info

The information I’m quoting if from my own node experience, which I accept might not be typical for the network, and I accept there are mistakes in assumptions and my English so sorry for that - my setup is very basic, but it manages to work and I can see from forum there are far more advanced setup’s out there, definitely at enterprise scale.

background (not complete)

All my node traffic is pre-processed and logged by pfSense firewall cluster, and sent to my dedicated logstash cluster to process (not just for storj )

I’ve got some logstash rules on there to enhance the metadata around IP address, and state with advanced GeoIP among other bits.

I’ve also got my Storj node docker logs integrated into Logstash using a docker shim and filebeats.

The output is sent into an Elastic cluster, and visualized on my Kibana server.

The purpose of all the log processing was to allow for some work I am doing on IDS, to allow for dynamic firewall control around block/allow and also around packet shaping for traffic based on a more intelligent rules engine (not relevant, but info on why I’ve bothered )

One of the side effects of the above project, is I’m able to with a high degree of accuracy map a client connection to a Storj nodes action, be it put / get / delete and further more map it to the satellite and segment ID, and ultimately the file location.

#cut

So the purpose of post was to say 3 months ago, my cancel / fail rate was 0.1-0.9% over a 1 week period

My traffic was predominantly from;

  • uplink clients, or self hosted S3 gateway from all over the world ~ 70% traffic
  • transfer.sh ~ 10% traffic
  • gateway-MT ~ 20% traffic

Now, Aug 21 this traffic has changed really excitingly, but also with consequences ;

My traffic is now ;

  • uplink clients, or self hosted S3 gateway from all over the world ~ 30% traffic
  • transfer.sh ~ 5% traffic
  • gateway-MT ~ 65% traffic

It is great to see Gateway-MT being so popular, but the usage of a node for S3 data which seem to be very much cold storage isn’t great :slight_smile:

  • failed / cancelled uploads
  • big segment uploads from a small pool of IP’s that we can’t control who is client
  • Huge deletes of data after 7/14/30 days so assuming backup jobs
  • No / limited Egress traffic as there is no need to read the data as it’s probably backups
  • Very full Trash cans of GB’s of data, that we aren’t compensated for when the jobs deleted.
  • Excessive disk Iops from the move to trash code >< I get it, but when we are talking GB’s into the trash, that is a write / read that wears on the disk not to mention the fragmentation.

Discuss :smiley:

CP

Where did you get that statement from? Afaik Node selection is completely random but since not all nodes are needed, the fastest ones win which are of course the closest ones. So the closest nodes within that random selection are used but there is not geo-targeted selection of nodes.

(Sorry I didn’t read the long rest of your post though…)

Your correct, I will find the exact quotes and update my first post with references, them as I want to focus on the network issue of latency from a Gateway-MT to a SNO node.

Good data!

My first feeling about the problem is, while S3 as a protocol is a second-class citizen in the Storj world (requiring a gateway, as opposed to directly connecting to nodes), it is going to be the most used approach by everyone except those who need all the performance.

Yet I think there’s potential risk in a long-term horizon that rational nodes with worse connectivity to these central points of distribution will simply filter at a firewall level any connection attempts from them.

TBH, I was already thinking of extending the storage node code so that it would store the number of successful and failed attempts to store data from each IP… let say, from each /24 block ;-), and if the ratio between these two values crosses some level, drop the connection before any data gets sent.

The gateway may also be used as a stepping stone by many customers. It’s easy to change a bit of code in your application to switch to another S3 compatible provider, but a lot more of an investment to rebuild code to with with libuplink. Even the customers that intend to do that long term probably start out by using the gateway. In the end this has to be a customer choice though. For many the compromises of the gateway will be worth it. But the long term benefits of direct connections will definitely matter to others.

What’s the use of this? Even if only a small amount of data ends up on your node, that’s still an amount you can make money on. Are your connections being saturated? HDD struggling to keep up or something? If not, I really don’t see the point of messing with any transfers.

2 Likes

Yes me agree, that what I trying to say badly - I trying to say I seeing a very large swing of customer to gateway-MT, and to think about the implication of that to decentralised storage, when we now have to central points of failure - S3 gateway, and Satalite.

Yes, apparently that’s what customers are choosing despite the downsides. But I think we should give them some more time to adopt their projects to the native implementation. They may still switch, but it will require a little more time investment.

Imagine you could identify an IP range where all uploads coming from that range had an accept rate of 1%. So for each 1GB of ingress, only 10MB is effectively stored. Does it make then sense to even try accepting that data? You’d be effectively wasting bandwidth that could be used to more effectively serve other uploads.

This is, for now, a theoretical situation, but if Storj becomes few orders of magnitude larger, I believe this might be a rational choice for some node operators.

I don’t think so, as the number of nodes scale with that growth. It’s unlikely that your node will get significantly more traffic. So yeah, if there is not bottleneck, I don’t care if it costs a bit more bandwidth to store data from far away IP blocks. I’ll take what I can get.

Again, imagine Storj being few orders of magnitude larger. Then most storage will sit in proper data centers with good connectivity to Gateway-MT or other centralized points, because at scale it will be worth paying for data centers. The small players can’t compete on latency or speed with data centers, so will only have chance at winning races within, let say, close geographical area. Maybe within a single ISP, maybe within a large city.

1 Like

the payment for cold storage is still fine, so the more customer data the better imo.

personally i haven’t seen any issues with successrates dropping, mine hover in the 99% range most of the time… generally it depends on my storage activity, if i limit the activity i get 99.9% on almost everything except uploads because i force them directly to storage before sync acknowledgements is done.
zfs using zpool set sync=always, sort of similar to turning off your hdd’s cache in windows, but not quite.

if i set my sync=standard then my upload successrates also go up to 99.9 while they are usually more like 99.7% max again depending on storage activity.

so far as i can tell there has been no chance in my successrates since i finished fine tuning my setup, so i think it’s unlikely that it’s gateways fault… or i’m in a favorable geolocation.

we have in the past tested ingress and each node running in optimal conditions would get the exact same ingress… but it was a while since we lasted tested it… but it would be fairly easy to retest.

@CutieePie you sure you storage isn’t running into iops limitations.?

That can only be based on a complete misunderstanding of both how the network works and what the goals of the network are. Nodes are selected at random, so even if half of the nodes are in data centers with super high speed, you would still get plenty of data. But it is no ones goal to have data centers as nodes. Storj is optimized to work with low powered nodes that do no require the massive expense of a data center.

It’s not what the network is built for and even if it would happen it wouldn’t significantly impact small node operators. This is all hypothetical fears based on false premises.

possibly not enough IOPS ? how many more you recommend ? I only got small bit for storj as we meant to be using what we have, so not really wanting to add more :frowning:

#edit - had to post this update, as not been checking - I have pool dedicated to Storj node, this made me smile think grafana scale graph is wrong :slight_smile:

Nice one! Your node(s) supposedly want to live so long :slight_smile:

1 Like

how big is your node?

generally about 400 iops (1 hdd) worth should be plenty, or it is for me atleast… but i do have a SLOG SSD.

the SLOG SSD for ZFS is really important as without SLOG ZFS will double the iops due to the ZIL.
one can also move the database to an SSD but with a L2ARC or plenty of RAM ZFS will.

also the newer nodes seem to have more files on the same storage, so newer nodes will see more iops that older nodes… not sure if that is a trend or just how it happened to be for now…

does make sense to put the worst workloads on the newer nodes, so if they drop out the network will already have gotten the advantage or if people create many nodes they will need a lot of hardware.

stuff like an SMR hdd could have problems, as uploads are the majority of data for a long time after making a new node, it does sound odd that your system by default wouldn’t be able to keep up…

but maybe move your databases to an ssd, i think that was the go to solution for reducing storj iops, or l2arc if your pool is zfs

With logs and DBs on an SSD, the spikes on my HDD are less than 100 iops :smiley:
logs and DBs are the worst part and should really be put on an ssd (best mirror ssd but ssds are really cheap, you just need 2x64GB).
The file walker at the start of the node is a pain in the … though as it causes very high iowait but after that, piece of cake.

1 Like

Goals are nice to have, but not all can be achieved.

My claim here is: if Storj grows few orders of magnitude, there won’t be enough spare capacity in home nodes, and hence I expect a significant majority of the nodes with still free space to be hosted in data centers. These will outcompete the few home nodes still wanting to fill their free space.

Doesn’t have to be a goal, but I think this will happen with the current incentive structure.

Great, that means data center costs will also be lower.

This is a wrong assumption. Just because I only advertize 4 TB to the network doesn’t mean I’d not be willing to buy another 10TB HDD when it makes sense. So the network size can easily grow to more than twice the size within a few weeks. We’ve seen that with chia and the demand in storj space will never grow that quickly.

2 Likes

yeah it’s just problematic to really gauge it in iops, the higher one gets in iops the more the disk latency spikes.
using in the upper ranges of iops can cause spikes of backlogs up to multiple seconds.

also keep in mind not all iops are equal, a disk maybe able to do 1200 sequential read iops, but maybe 1/10th if all the blocks being accessed are located far apart, or if one has high random io.
also these are a generalization, but shouldn’t be far off… for most consumer/ enterprise sata hdd’s

i can without a doubt clearly see my successrates go up and down… had a time when some vm’s ran out of memory and was swapping, which caused my successrates to drop down 70%

most of the time tho, the drop isn’t more than a few % down from 99.9%
i would expect most to see that if running under optimal conditions, ofc many things could create bottle necks, by personally it’s always my disks workloads.