Let’s say some develops a file hosting service like google drive. A school board makes a high resolution video welcome back video for the student, which will all be viewed at the same time tomorrow. At 8:30, all the students go to class and they all watch the video. Now, the pieces for the first part of the video is being requested from the nodes that have them, but hundreds of classes are playing this video at once, these nodes can’t keep up. How would the Storj network deal with this problem.
Another great question @itisyeetimetoday!!
The answer to this question can be found in section 6.1 of the whitepaper, Hot files and Content Delivery.
Basically, if a file gets hot we want to be able to create more pieces dynamically - so that the erasure coding schema goes from 29/80 to 29/800, 29/8000 or even 29/8000000000000+ (if we are talking Game of Thrones hot!). Then Uplinks ping satallite, and download hyperlocally from the closest nodes which have the filepiece.
Description from Section 6.1 in Whitepaper here:
Occasionally, users of our system may end up delivering files that are more popular than
anticipated. While storage node operators might welcome the opportunity to be paid for
more bandwidth usage for the data they already have, demand for these popular files
might outstrip available bandwidth capacity, and a form of dynamic scaling is needed.
Fortunately, Satellites already authorize all accesses to pieces, and can therefore meter
and rate limit access to popular files. If a file’s demand starts to grow more than current
resources can serve, the Satellite has an opportunity to temporarily pause accesses if necessary, increase the redundancy of the file over more storage nodes, and then continue
Reed-Solomon erasure coding has a very useful property. Assume a (k, n) encoding,
where any k pieces are needed of n total. For any non-negative integer number x, the first
n pieces of a (k, n + x) encoding are the exact same pieces as a (k, n) encoding. This means
that redundancy can easily be scaled with little overhead.
As a practical example, suppose a file was encoded via a (k = 20, n = 40) scheme, and
a Satellite discovers that it needs to double bandwidth resources to meet demand. The
Satellite can download any 20 pieces of the 40, generate just the last 40 pieces of a new
(k = 20, n = 80) scheme, store the new pieces on 40 new nodes, and—without changing
any data on the original 40 nodes—store the file as a (k = 20, n = 80) scheme, where any
20 out of 80 pieces are needed. This allows all requests to adequately load balance across
the 80 pieces. If demand outstrips supply again, only 20 pieces are needed to generate
even more redundancy. In this manner, a Satellite could temporarily increase redundancy
to (20, 250), where requests are load balanced across 250 nodes, such that every piece of
all 250 are unique, and any 20 of those pieces are all that is required to regenerate the
On one hand, the Satellite will need to pay storage nodes for the increased redundancy,
so content delivery in this manner has increased at-rest costs during high demand, in
addition to bandwidth costs. On the other hand, content delivery is often desired to be
highly geographically redundant, which this scheme provides naturally!
I will be great if there will be posibility to prepear this thing. Like if you put fideo and you know tomorrow will be 10 000 users whatching, that you can set multiplier, and system will be ready for that tomorrow, not start understanding this during process, this is to late to start multiply and this is long process. Like if we know that tomorrow event, and video is for this event.
I believe you can change the RS settings in the uplink. Although the last time I looked at this was a long time ago and since this would have impact on how much storage you use and so it impacts cost as well, I’m not entirely sure how that would work at the moment.
But if you know beforehand the demand will be high you could prepare by increasing the initial RS settings as well.