Storage vs bandwidth

joep · December 14, 2019, 10:48am

Hi all,

Is there an estimate on the amount of bandwidth would be be optimal per TB storage in use?
Or to turn it around: I have a 1Gbps unmetered uplink. What storage capacity would fit that?

Vadim · December 14, 2019, 10:55am

It not posible to calculate. No one know what data you will get and how often client will geting it back.
You can add as many as you can give. Not buy any HDD, as you get this money bak after very long time.
Use storage that you have.

direktorn · December 14, 2019, 12:05pm

There is not. Your will most likely never saturate your gig connection (If its symmetric). I dont think that’s the most important part as the chunks are so small so you would need hundreds of thousands connections to make the connection a part of the formula.

Its more important that your storage system can deliver the chunks quickly and as I’m sure you understand that involves I/O.

The most important part of the formula is where you live. If you have a 100TB storage note with 10 Gigabit fiber in the middle of the Sahara desert … well that is not the same has having one in Manhattan, NY…

Vadim · December 14, 2019, 12:23pm

It more depending where are costumers, if thay also in midle of Sahara than it is all OK.

joep · December 14, 2019, 12:34pm

And does are all important variables, true.

But I was hoping some of the v2 experience could be translated into lessons learned. Stuff like:

for each 1 TB storage we estimate between x and y
these regions see the most traffic…
what are the (utilization) targets for 2020-H1?

I am assuming more can be said other then: it depends…

Vadim · December 14, 2019, 12:54pm

Can you predict what I will do today? I think not, then you understand that it is not predictable who and when and how many data will upload today, tomorrow.
If i tell you you will get today 5 terabyte of data, and you will not, tomorow you will ask me why you not got this data, i invedted my hardware and electrisity and not geted profit. It is handreds of factors that can be good and bad.

joep · December 14, 2019, 1:27pm

All very true, but I am not asking for the individual projection for my node. I am asking about the averages, lower and upper limits of the group as a whole or significant subsets of that group (like Europe vs Asia).
I am assuming storj made estimations and would not launch in januari if the available capacity would not meet expected demand.

Stuff like:

Based on the success of V2, we hope to have 2PB in the network in march with the majority of that in Asia.

A valid answer could also be: We do not share that information publicly to avoid setting expectations we might not be able to meet.

Alexey · December 14, 2019, 2:59pm

I can sure you, the Storj Labs as much transparent as possible. They do not hide any information.

There is really no any predictions, because there was almost a test data traffic. The behavior would be completely different with a real customers.

So, unfortunately there is no such equation at all. Even in v2.

deathlessdd · December 14, 2019, 4:02pm

I can give you an estimation based on what I went though, Since may ive gotten 3.5TB on one node, This has been running 24/7 on a 1gig link with a 10gig internet so theres no limit of internet speed. It hasnt came close to maxing out my 1gig yet so dont expect when storj goes live that there will be a saturation of Bandwidth because Unless people are using storj as a Netflix streaming constantly off the nodes I wouldnt expect nothing close to it. Cause once people upload something its going to be split to 2000+ nodes around the world and they may never access that file for months. So in theory you cant predict the exact usage that customers will be doing, It could be super high or it could be super low with no useage. But right now theres test data so they know exactly where there is going to be.

joep · December 15, 2019, 9:11pm

Ok thanks for the data. That’s the same I’ve been seeing.

I Still think it’s weird there are apparently no sales (adoption) targets.

And the fact that the objects are split over so many machines makes sure everything averages out and you do not have to account for the traffic pattern of a single user. It’s a design that is good for reliability but also for predictability, the system has a good way of addressing any hot-spotting in the load by just routing requests to other nodes.

The system lends itself excellently for such analysis, it’s hard to believe these have not been made.

6 copies of the data in the network
The encoding enlarges all data by a factor of 1.5 (I believe)
Each node gets data from many customers, such that everything averages out.
1TB of on disk storage is about 666GB of customer data.

So by: average data requested in V2 (per day) divided by the customer data in the network for that day
Take into account the copies and the encoding, and I believe you would have a ballpark estimate if the customers use-cases stay the same. (Although I checked none of my assumptions above)

also:
What is https://storj.io/storage-node-estimator/ based on? There is no relation at all between storage and egress. 0 TB node could revenue me a few thousand in 15 months, just by egress traffic.

deathlessdd · December 15, 2019, 11:26pm

I wouldnt go by that that is like best case, Unrealistic really but it gives you an idea what to expect if that is how much data your able to get. But in reality thats not how it normally goes you shouldnt expect to see that kinda data on Your single node ever if you maxed everything out thinking that is what you think that your internet connection or your node can handle you will be disappointed very quickly.

Alexey · December 16, 2019, 4:20am

We are in the v3 beta. There is no paying customers yet. Only earlier adopters, developers and testers.
You can’t make predictions on this basis, even you would have a full stat. The pattern for production will be a different anyway.
I’m not sure would it be possible to predict anything in the production, since our users are real people, not machines. I have no idea how to predict people’s behavior.

It works differently:

joep · December 16, 2019, 8:35pm

Hi Alexey,

So to sum it up:

The customer traffic/usage patterns observed in V2 cannot be projected into V3.
No independently developed expectation of what normal behaviour for the group of customers should roughly look like
There is no upper (or lower) bound expectation regarding adoption or growth projection for Q1/Q2
There is no upper (or lower) bound expectation on what percentage of the stored data is requested per month. The SNO could be encounter anything between 5% and 10000%.

Again, I am not expecting facts. we are obviously still in beta. But you plan these things, so only have to worry about the deviations.

It sounds like:

The preferred storage capacity at launch and in the first months after is an unknown
There is no initial expectation of normal behavior during launch, and hence no way to determine issues in this area.
The compensation for SNO’s was a stab in the dark and we don’t know if it will be profitable or cost-covering.
no business / capacity or usage expectation went into the design and build of V3.

It’s harsh of me, but it sounds like you have an awesome engineering team and a bunch of wishful fairies on the business side of things.

I’d like to see this work and maybe build a small side business around being a SNO;
But if this is really how little effort Storj puts into the non-technical side of planning and forecasting a product. How can I justify gearing up towards any sort of investment in 2020? There is no way to steer if there no coarse set.

BrightSilence · December 16, 2019, 10:40pm

Being an outsider I can’t really answer everything, but I can respond to a few things based on what has been communicated and what I know about the network. So I’ll give it a go.

Not really, the way V2 worked was not scalable and basically only allowed for short term storage (3 months). Usage patterns on V3 can vary much more as a result of the flexibility the V3 network offers. It would work perfectly for both archival storage as well as CDN like storage and activity. Because most of these usage patterns are new, predictions based on V2 are likely not representative.

I’ll try to respond to all of these in one go. As I mentioned the varying use cases make it hard to predict actual usage patterns. The first customers are now being onboarded, but even those are likely still doing tests mostly. Storjlabs may have some indication about expected usage patterns for their larger customers, but they will probably wait until there is an official announcement about these partnerships.

To a certain extent, I’m sure it is. That’s why both the storage node wait list and the developer wait list are gated so they have control over the increase in both supply and demand. I’ll skip the next point as it is similar to a few previous questions.

This one is very easy to answer though. Since SNOs are being paid for bandwidth used and storage used and customers are being charged for the exact same thing with a healthy profit margin in between. Storjlabs makes money with every customer use of the product. The more it is used, the more money they make. This isn’t the Uber model where they lose money for years until it scales. Every byte stored as well as downloaded makes a profit margin. It’s very simple to calculate and comes down to roughly 50% for the SNO a percentage for open source partners and the rest for Storjlabs. (Please don’t pin me down on these exact numbers, but that’s at least in the ballpark)

A broad scala of usage patterns went into the design of the platform. There are features built in to increase redundance and distribution for CDN like use cases as well as options for long term storage. Storj is also testing a broad set of use cases themselves. Because of the way the network is set up it’s pretty hard to saturate any part of it, but I’m sure this measures are taking into account while inviting new developers on board.

While there may be ways to turn this into a side business. It will only work if you have several locations and different IP /24 subnets available. The thing is that the network is set up for decentralization. So every individual location can only make so much money. It’s not supposed to be hosted in data centers, but rather distributed across many small nodes in different locations. If you’re serious about turning it into a business, you may want to keep an eye out for requirements for becoming a tardigrade satellite operator. This will however mean you’ll need an HA infrastructure and you’ll be responsible for paying SNOs for their work. It’s much more involved than being a SNO.

I guess I’ll end this with… this is not a get rich quick scheme. It’s a solid technological and financial construction, which can operate in a free market by mostly using spare unused space on machines that are already online. Since Tardigrade pricing undercuts traditional cloud storage and takes something off the top for themselves, you’re being paid a fair amount for your efforts, but nothing extraordinary. This also means you’re unlikely to earn back a big hardware investment any time soon. You’re getting decent pay for otherwise unused online storage, but you’re not making them stacks.

heunland · December 16, 2019, 10:54pm

Not much I can add to that, other than pointing to our roadmap to show that this is not some boat aimlessly sailing the oceans without a set course. In fact, up to now, we have hit our milestones consistently and successfully (unlike many other projects in the space) and have been transparently publishing the details of everything we do as well as asking for feedback from our community. Furthermore, we invite you to review our business section of the Storj blog and check out the recaps of our prior townhall meetings to get an idea about what we have been working on and where our goals lie. Also, we have our next Townhall coming up in January which I would hope everyone here will sign up for and attend to ask their questions and find out what progress we have made this last quarter on staying on track with our production release and other milestones.

deathlessdd · December 17, 2019, 2:45am

Here the thing storj isnt telling you to go invest money into hardware of anykind its pointing at people who have hardware already that arent fully utilized and if they wanna make some money off there already paid for hardware. The problem is your looking at this as a datacenter and you need to go buy datacenter hardware, this isnt the case here they dont want people to spend money on hardware in a datacenter, this project is solely to be decentralized not to centralized the data on the backbone of big datacenters.

direktorn · December 17, 2019, 10:41am

I would recommend you to visit their Youtube channel (https://www.youtube.com/channel/UC-cTEqWwZV5Rl-h0RZsp2Qw/videos) and watch the “town hall” events where they have shared plenty of information about how they intent to grow the network overall.

Any data would be useless however for you, it might be relevant if you plan to host a few Petabytes of data (1000 TB) as you want to scale accordingly.

joep · December 17, 2019, 8:04pm

Thanks all.

Especially Brightsilence, that finally gave some background to the earlier statements.
It knitted the whole thing into a story line that I could match with the quality of the product.

I missed the last townhall, but I will definitely checkout the youtube channel. I did not know that existed.

Let’s see what production brings us