Storage Capacity's Effect On Egress Bandwidth

Hi Guys!

This may seem like a rather obvious question, but how significantly does the amount of disk space given to the network effect how much egress data your node experiences? I won’t pretend to understand how the network works at its core, but my line of thinking is:

If there’s more data on your node, then there’s more data that can be downloaded from your node at any given time

meaning that, if you have more disk space, your node is going to experience more egress data. Is this a valid evaluation of how the network (sort of) operates? For the sake of understanding, assume bandwidth is not an issue (gigabit connection?).

I’d appreciate any insight here, thanks!

That’s basically accurate, but also a little to early to tell. Right now it is mostly dependent on test traffic. Nobody knows what production use with only customers looks like. In general I think it’s reasonable to assume if you have more data stored there is more available to download. But I could also argue that it’s not unreasonable to think that recently uploaded data will be downloaded more often. It really depends on customers usage patterns.

With respect to your last point: that’s kind of what I was trying to get at. Who’s to say that a node with a small amount of available disk space (with a solid connection) couldn’t rack up just as much egress data as a larger node with a “less solid” connection. I know this has been brought up before, so I guess I’m just curious if there’s a somewhat definitive way of knowing how the network will behave based on a node’s parameters.

You cant really control what kinda data your node gets, Say you get data from customers that dont access there files for a few months and you have a small node of say 500gigs vs someone with 6TB of space, The smaller node will be less likely to have as much used data as the bigger nodes.

Someone with only 500gigs will get filled pretty quickly as it wont get any new data after it gets filled, and if the customer doesnt access there files or delete them then your stuck with a full node.

But as of right now there is no way to predict how customers will be using there data.

Right. While it may have been suggested, is it within Storj’s means to make the network more intuitively distribute a user’s data (to different types of nodes) based on that user’s data utilization habits?

You could but what do you base a persons usage on if there new to the network, Unless when they sign up they put personal use, Heavy use, etc. But in the real world anyone can put anything. Until they start uploading how exactly would the node know if this person is going to be a regular user or if there going to be someone that may or may not need to use the data again in say a month or so. You cant really predict a person before they actually use it right?

Yeah I guess that’s the core problem of that theory. However, it’s not outrageous to think that, initially a user’s data is pumped out at random. As they utilize the network further, though, their data is weighted towards nodes with a faster link speed. Or perhaps their data is mostly stagnant, so it’s directed towards nodes with a lesser connection (or those with a low bandwidth cap). They still pay for what they use, the network just becomes more efficient in handling their usage. Maybe it already does this? (or meant to down the line?). I haven’t read the white paper, so I’m not in a position to speculate. Just thinking out loud.

The nodes already do this in a way as nodes fight to get data first so if your a heavy user and sending lots of data the more powerful nodes will for sure get the data first, But if your a user that isnt really uploading fast as it is the nodes dont really have to fight to get the data first.

No. Available space makes no difference. You could offer 500GB or 50TB, the egress could be the same.
What matters is the USED space. And if both nodes start at 0GB usage then they are likely to both fill up to 500GB within the same time, possibly having the same egress.
Only once the 500GB node is full and the 50TB node has accumulated 2TB of data, then the big node is likely to have more egress because it has more data that could be downloaded (as discussed, also depends on the users and what kind of data they stored).

1 Like

I saw a proposal to add parameter for uplink (at bucket creation if i recall correctly) to mark different types of data like “archival/backup” / CDN / Streaming / etc before actual data upload begins for better node selection.

Although i think it is not implemented yet and just plans for future network optimization.

1 Like