Storage node selection

AFAIK storage nodes get selected on quickest response. I believe it is correct to assume that normally the succeeding nodes will be close to uploader.

But what if up- and downloader are located in different regions?
For example:There is an application for world wide use. But data uploading and processing is done mainly in let’s say India by a contractor.
Would that result in having the data uploaded mainly to nodes in the India region, potentially resulting in slow downloading for customers outside of that region? Or is there some algorithm in place that distributes the data nearer to the downloaders where it belongs in such case?

2 Likes

At least right now, your premise is wrong. Nodes are selected at random from a pool of online nodes that aren’t disqualified or suspended and have space available to store pieces.

There has previously been talk about selecting node pairs and picking the one with the best reputation. But from what I can tell this isn’t implemented. Reputation is currently not a single value, but checks are done by using audits that check that files are there and correct and uptime checks that make sure your node is online. Neither of which are correlated with speed.

After that 110 nodes have been selected and the upload starts until 80 nodes finish. This is a large enough chunk that those 80 will still be broadly spread around the world.

So right now, pieces are stored across the world. And even after the best of 2 method of reputation is implemented AND speeds are incorporated into reputation calculation, the nodes will still be all around the world.

Do you have a source for that? As a SNO allmosa all my uploads and downloads come from my region.

If that is normal behavior Tardigrade certainly needs some kind of CDN functionality that makes sure that download data is available close to the downloaders locations.

Why? It is available close to the region as requests to reach the data comes from the region? StorJ is a CDN so I don’t think it would make a whole lot of sense to add another one. Yes its not a effective CDN - in fact I’m planning to play around with the idea to use StorJ as a CDN but haven’t had time yet.

Yes, it’s open source… my source is the source => storj/satellite/satellitedb/overlaycache.go at ccf4f9ed2dcf3c0d3f129c7daedb4e99b7b0d154 · storj/storj · GitHub

2 Likes

But what if the requests for that data come from a different region? This is what i am questioning.
If you are a a software developer and offer an application for download and you use Tardigrade to store the application cost efficiently, I was just thinking if there is or could be any impact in such scenario if the data is stored close to the source instead of close to the recipients.

In the end this is one reason why CDNs exist. For good performance the data must be near to the download destination.

CDNs exist for large volume use cases and to avoid choke points in the network. A few hundred miliseconds extra won’t be much of a problem if you’re the only one or one of a few downloading the files.

And I think we can all agree that Storj would need some changes to serve large volume downloads of a single file. Storj is well positioned to expand into that use case though, because of the distributed nature of nodes, there are almost by definition no choke points as all data is sourced from different locations. So upping the erasure coding settings so that many more nodes hold pieces for a single segment would be enough to scale to a CDN style service.

2 Likes

This is what I had in mind. I was thinking of some really large software projects with millions of downloads worldwide, like kodi.tv or portableapps.com.

BTW: If I remember correctly, kodi.tv recently hat issues with their hoster due to the traffic they cause. @super3 : Maybe another opportunity for Storj to step in and offer help. This would certainly gain a lot of media attention.