Storj as CDN provider

cyruschan360 · March 29, 2021, 12:09pm

I feel Tardigrade users now treat Storj as AWS S3 Glacier which provide cheap archive storage. Hence, storage supplied by node operator are filled up with low value data and little egress traffic are demanded. Low reward results in a low incentive for node operator to provide high availability, high performance hardware (SSD RAID+10GbE). However, Storj has great potential as CDN provider like AWS Cloudfront, Akamai or Fastly, in which node operators can be benefited by high profit egress traffic.

I believe Storj can compete with those CDN top players for its highly distributed nature. I hope Storj can be used as AWS S3 standard one day and gain a substantial business model. #HODL

nerdatwork · March 29, 2021, 12:11pm

SGC · March 30, 2021, 8:30pm

i would very much like to see storj becoming a CDN type service also… it is after all very similar just a bit more like decentralized caching… but i would assume that would be a “trivial” conversion in atleast the fundamental aspects… ofc there will need to be a lot of customer facing stuff that might be needed… but i would be surprised if the storj backend basically couldn’t be a CDN backbone…

not sure how this should be approached, it’s a rather complex and long term topic involving a lot of overview… the only real issue i see would be that storj would essentially be competing with projects like videocoin which i think they partnered with… but i could be wrong on that

ofc that might not mean that there is a contractual obligation to not become a CDN themselves.

but maybe they already are doing that through videocoin…

its an interesting idea tho.

if distributed storage if the future of storage… which it might be for many things atleast…
then distributed CDN’s are what makes the internet of the future go round.

Doom4535 · March 30, 2021, 11:51pm

This would play to a lot of StorJ’s strengths, they would need to make a separate pricing system for it.
Althoughh, there isn’t really any reason that the CDN would need to be a Tardigrade product, in fact, it might be best if they used a separate platform for it to prevent confusion.

The CDN could use it’s own satellites (they could be hosted in the same locations if desired) and nodes could add them to their white list if they want to participate in the CDN, this would allow for an easier implementation of a different pricing structure from both ?StorJ/Tardigrade/The Company? and the SNOs for the end customers. It would also play to their infrastructure’s strengths, since they have StorJ for storage, Tardigrade for S3 satellite access with billing and scale management, and another platform with it’s own services emphasizing CDN and built on top of StorJ storage. Further more, a different set of operating parameters could be used for CDN SNO’s, possibly prioritizing faster uplink speed over downlink for example.

SGC · March 31, 2021, 7:43am

yeah totally agree, the customer base for CDN will also be very different from the Tardigrade customer base, so it makes good sense to keep the customer facing side separate,
this also allows the CDN project to fail separately if such a thing should happen.
in regard to the uplink i’m not sure it would make a huge difference, most likely it will be more latency related imo.

Toyoo · March 31, 2021, 11:11am

Frankly speaking, I have no idea how would anyone implement CDN work on top of Storj nodes. CDN has much tighter requirements than just file storage. Firstly, you need customers to speak HTTP to nodes. This means the nodes would need to store full files, not just redundant stripes. Secondly, to keep latency low, you need customers to talk directly to nodes, not through a middleman like a satellite. Thirdly, nodes would be required to have much better uptimes and bandwidth.

Storj is not in position to leverage standard CDN tricks like GeoDNS to serve different IP addresses to different customers for the same DNS address, nor can it put a load balancer that would be close to nodes.

BrightSilence · March 31, 2021, 11:30am

Splintering the network is never a good idea. But especially when that splintering is based on properties that will need to be verified and can be tricked. Storj has and probably always will fix these things using distribution and overprovisioning.

This CDN use case has actually been addressed in the white paper, so I recommend everyone to check that out. It’s mentioned as a possible future upgrade, but outlines a few issues. Most of all, scale. The current network is fine if a handful of users want to download data at the same time. There are 80 nodes with pieces, 39 of which are involved with each download (including overprovisioning) and nodes can handle a couple of transfers at the same time anyway. No problem. But thousands of downloads would mean the nodes holding those pieces also get hit with thousands of requests. This would probably take out quite a few nodes. So… could the network do something to deal with that? Yes!

The erasure coding system used for redundancy can also be used for wide distribution. Based on demand repair workers could automatically kick in to create more erasure coded pieces so that maybe you only need 29 out of 1000 pieces to recreate the file. And suddenly the load is distributed among a lot more nodes. Of course this also comes with an increase in storage costs, but I think since this only happens when lots of paid egress is happening, Storj Labs can probably swallow those costs and not bother the customer with it. Then when demand drops satellites could eventually issue deletes to some nodes to prevent having to pay for large storage long term.

There are still other things that need to be tackled though. But at least this use case has been on their minds.

Doom4535 · March 31, 2021, 11:59am

I don’t believe this would split the network, although perhaps I could have called ‘parameters’ requirements, what I meant was that they might have a different service agreement for users who joined dedicated CDN satellites. It would also simplify using a different pricing structure (probably something that cost more for storage and less for egress?); I’m not sure what a good going rate would be, and what would also be sweet enough to tempt satisfied costumers to relocate.

@Toyoo, those are some good points about satellites introducing a degree of latency. Although, for larger files it shouldn’t be too bad because the satellite points the users to the SNOs that they then connect directly to (the reason we have to use port forwarding is so these other unknown connections can get through).

I know they were working on or did have an IPFS implemention using the existing setup? This could also be more or less compared to a CDN approach, but not compatible with most existing browsers (and I believe IPFS URLs are still a pain).

Toyoo · March 31, 2021, 1:37pm

Well, the original post wanted to compete with Cloudfront or Akamai. They cope with use cases of large number of small/medium-sized files just fine, and to be a proper competitor, Storj would also have to.

Unless, well, Storj would limit itself to large files (as it is now). But then I could hardly call it a competitor…

BrightSilence · March 31, 2021, 1:46pm

How is that not a split of the network? You’re literally separating CDN traffic out to a different satellite and different set of nodes. My point is that this isn’t necessary at all if automatic scaling of RS availability is implemented. All nodes could participate just like they do now.

As for pricing, CDN pricing usually deals only with bandwidth. This makes sense as the data replicate and distribution strategy the CDN takes has massive impact on how much storage is used and really shouldn’t be the concern of the customer. I would say the same applies to Tardigrade CDN. Storj Labs should be responsible for how dynamic scaling and distribution works and shouldn’t charge customers more just because those steps require more storage on nodes. Since CDN’s would usually lead to the vast majority of costs being related to egress anyway, that makes sense for Storj as well.

So my suggestion would actually be to ditch storage pricing for CDN use cases or maybe stick with storage pricing as if there wasn’t additional storage being used to upscale availability. Considering $0.045 per GB, that’s actually competitive with other CDN’s. This pricing only falls apart when you look at the volume discounts on offer by the competition. Though I’m sure Storj Labs could offer such discounts as well to a certain extent. This means the pricing model actually already fits CDN use cases quite well. Which is even more reason not to separate the two on the network level.

What’s most important is to actually have a browser based uplink to recombine the segments. I think the implementation of QUIC which is being worked on right now could serve as a nice communication protocol for that. The way access grants work now, a read only access grant could be created for the specific files and used by the browser. Still quite a few challenges to overcome. There was a nice post outlining those challenges here: JS library for the browser - #7 by jtolio

Another route would be to use the multi-tenant gateway hosted on multi-region cloud as the http endpoint. This is especially important if you want to enable client applications without a full browser to be able to download files as well, like podcatchers for example.

I don’t see how that’s going to change. When delivering lots of small files, it’s all about latency. And latency is pretty much the only issue you can’t fix if you rely on a world wide network of nodes on consumer connections and hardware. There is no way to compete with the performance of cloudfront and akamai on that. However, there are plenty of use cases for things like videos, podcasts, file sharing, and others, where the importance shifts to bandwidth. And Storj can definitely compete on that front.

Doom4535 · March 31, 2021, 5:23pm

This is why I was suggesting using StorJ’s ability to support multiple ‘services’ to do this. You essentially spin up a CDN service and likely include a default opt-in if desired in a future update. From an accounting point of view keeping parts of the business side split would likely be good. It would still use ‘StorJ’ underneath, it is possible that some people would choose to only participate in one or another, but as long as they have free capacity it will likely be most profitable to be part of all of the options to maximize growth.

As an aside, as things grow, we should probably work on terminology… I think of StorJ as both the token, Company, and software I’m running which likely isn’t good for clarity (and another reason to differentiate the customer front for a CDN). Something like:

StorJ = Token
Node = single instance SNO
Satellite = satellite
Tardigrade = Warm file storage
CDN name = CDN
SJ Labs = StorJ Labs (the company)

Also, some way to specify this isn’t ‘mining’ (public tends to think anything crypto is mining) SNOs are more like independent contractors who lease out and maintain storage that Tardigrade currently distributes and handles business side details for (we’re the Uber/Lyft drivers of the storage world if you will). We are paid with a crypto token for a variety of reasons, but a large portion derives from the distributed nature of the StorJ network and the logistical complexities of who knows how many national currencies and laws.

Doom4535 · March 31, 2021, 5:33pm

Another thing, I think it would be a good idea to enable support for the client to pay for bandwidth as well. That way someone could host a site at a predictable rate and with less upfront work. This solves two issues I can forsee, a way to limit the theoretical potential for run away costs if something goes viral (I image StorJ Labs would work things out if it did happen) and for cases where one wants to make things available, but not pay for traffic (still would need to front the storage, but that is a fixed cost).

SGC · March 31, 2021, 5:36pm

it’s my understanding that a CDN doesn’t store all the data all the time, it’s basically a cache and thus it would have some amount of allocated space and depending on which data is the most requested would be pushed to more nodes for additional bandwidth or for their geolocation in relation to demand.

tho it might run on the storj software and in the same node it would need to function in a vastly different way, something like a series of satellites that run something like tardigrade but with tweaked parameters to function more intelligently and with less overhead since temporary data doesn’t need much redundancy and such…

and yes the payout parameters would most likely also need to be different, but i duno.
pretty sure a gamer won’t pay 5$ to download their game faster… or they might…
but i doubt its a sustainable model… also we have to consider that the download price is based upon data being stored and maintained for customers on an indefinite basis, a geolocation aware / demand adjusting cache / CDN wouldn’t has the same costs for downloaded data.

it would just be brilliant if the nodes could actually do more stuff.
ofc for a new company it might be a stretch to implement it depending on how well their first use case is going.

certainly an interesting topic, sadly my knowledge about the whole thing is kinda limited
but storjlabs has a powerful network of nodes across a wide area of most of the world… i would imagine, it would be quite powerful and provide more incentive for node operators long term.

another storjlabs advantage would be their already built network of node operators, its after all mostly
just code needed to make it viable.

addendum

this got me thinking… isn’t this sort of like that gateway thing storjlabs just added…
the idea being that it will take care of the multiplication of the upstream data to the network…
that would sort of be the same thing…

maybe really for tardigrade to function optimally a CDN front end might even be the optimal solution…
just like your computer doesn’t send data directly to the storage, there are caches inbetween to optimize performance.

BrightSilence · March 31, 2021, 10:49pm

That’s one way to go, sure. But why not simply provide this functionality as a setting for specific buckets or even individual objects. Something like a scale on demand setting. Perhaps it doesn’t even need to be a setting as the demand already guarantees enough egress income that the scaling will be paid for anyway. That way people can use it as a CDN, but it would also benefit customers with unexpected spikes in demand.

Not sure what you mean here. Opt in for who? And what would they be opting in to?

I don’t know about that. What if I as a customer want to use both services in conjunction? I think it would be a unique strength to have a platform that can already do both. You could market for each individual purposes or as both without having to separate the product.

As for terminology Storj Labs is the company, Storj is the decentralized network as a whole. It basically refers to the backend architecture. Tardigrade is the name of services offered to customers, hence why I earlier refered to Tardigrade CDN. The current service is technically object storage, not file storage. SNO and node are also not the same thing, Storage Node Operator refers to the operator of one or more storage nodes.
Side node, there is no capital J in Storj, it’s pronounced as storage, which I guess doesn’t need explanation as to why.

You’re exactly right about that being a SNO is providing a service for compensation. It indeed isn’t mining.

I’m not following this part. They are paying for bandwidth. Not sure what your suggestion is here.

I think Storj would work best when the data is already there. But I see no reason why you couldn’t build a system that uses Tardigrade as a cache when needed. If the CDN scaling capabilities are built into the platform, the existing uplink libraries already provide everything you need to code it as such.

It still requires redundancy since nodes still go offline from time to time. But the redundancy kind of happens for free if you scale up by extending the number of erasure encoded pieces. Other than that, I don’t see why if removing overhead is possible, why not do it for all types of customers?

This part makes no sense, the gamer never pays for the distribution. Pricing for bandwidth is already competitive with other CDNs. Even if what you mean is the cost that would be passed on to the gamer, that cost wouldn’t be any higher than with other CDNs.

Geolocation aware is really not necessary for providing large files at high speeds. Storj does this by doing transfers in parallel. Latency really doesn’t matter as long as the total throughput is high. The current system would already work perfectly if the scaling of erasure coding pieces is implemented.

If I’m understanding correctly, this is basically putting a CDN in front of Storj, instead of using Storj as the CDN platform. Kind of defeats the purpose.

Doom4535 · April 1, 2021, 10:44am

The opt in was in regards to how I was proposing to use separate satellite’s, SNO’s need to have the satellite whitelisted (in this case the whitelist is kinda like the opt-in).

This was poorly worded, it would be nice if Customer A could host data but not be charged for Customer B's download bandwidth when Customer B accessed Customer A's data (for example, two companies who are working on some project together or have some sort of mutual agreement).

SGC · April 1, 2021, 11:58am

the front end would be geo aware caching that would be triggered by increasing amounts of downloads of the same data and then push it to the geolocation / locations that request data to limit the required internet bandwidth utilization globally.

it’s not to replace storj, its to augment it… the whole concept with the CDN is to decrease the global bandwidth needed for repeated data transfers… like a cache in a computer.

when the data isn’t in the cache or isn’t repeatedly requested it goes directly from tardigrade.
infact the cache would be more like a counter, the first few times that data is requested from tardigrade it wouldnt even interact with it, it would just like count it and maybe record the geolocation / internet locality demanding the data.

and one of the reasons it would be optimal if it was combined with tardigrade is that the cache and the storage requests if existing in different locations, would add extra latency.

ofc there might be security concerns because all the pieces would be stored in the cache, so it might be that not all data would be viable to access like that, since it might bypass one of the fundamental security compartmentalization’s in tardigrade.

ofc one might just make it so customer can select if data is allowed to be cached by gateway.

Knowledge · April 1, 2021, 5:07pm

Things like this could happen down the road, like the Whitepaper points out. In the immediate though, the team is actively working on a number of initiatives to strengthen the existing system. There are performance metrics we want to hit, and at the same time maintain network/service quality. As Storj improves technically, usage will also improve.