Architecting a Decentralized GitHub Backup

Alexey · October 18, 2019, 10:50pm

peem · October 19, 2019, 3:27am

Great idea, congratulations

the next one should be “registry-1.docker.io”, because often “timeout”

kevink · October 19, 2019, 5:33am

It’s a great idea but storj is not a truly decentralized and independent platform either. If the operators of a satellite decides to shut it down, all data stored using that satellite is lost. Therefore storing data on STORJ also has a single point of failure (the operator of a satellite), who is also either an individual or an agenda/profit driven organisation, just like github itself.
So while the idea is nice, it is not like the github backup will be available on the network forever without any possibility to shut the backup down, since you only need to shut down the satellite operator.

If I got that wrong, please correct me.

peem · October 19, 2019, 12:25pm

na razie jest tak jak opisujesz, ale wszystko jest jeszcze w budowie i może kiedyś doczekamy prawdziwej decentralizacji…

Alexey · October 19, 2019, 12:27pm

You will be able to use an own satellite to store files on the network after the launch to the production.

kevink · October 19, 2019, 12:38pm

I understand that but it still makes me the single point of failure. If I decide to shut it down, the data will be lost for everyone. It is basically the same as one person/organization running multiple datacenters hosting their data. One person/organization is in complete control of the data.
So if the goal is to make the github backup available to everyone “forever”, then the risk is high that this single point of failure (the satellite operator) might decide (or be forced) to shut down his satellite, making all data unavailable.
But that is only looking at one technical aspect because no matter how it is implemented, someone will have to pay for the data being stored and therefore there’s a probably even bigger point of failure. As soon as nobody pays for the data anymore, it will get deleted anyway.

BrightSilence · October 19, 2019, 7:25pm

Sure, but that’s not really within reach for everyone. And it would also create another copy on the network, which is not really necessary with the protections already in place.
This would be a very good use case for some federation system between satellites, so that the data could be accessible through multiple satellites.

Vadim · October 20, 2019, 7:59pm

Is there separate payments for satelite providers, what are requaerments?

kevink · October 20, 2019, 8:01pm

Off-topic, please create a new thread

Knowledge · October 28, 2019, 6:48pm

There is risk. Is the risk high? Probably not higher than the technology design being new and prone to undiscovered weaknesses. That being said, if you are able to operate your own satellite and thus, satellites, you will have the ability to create needed redundancy as you see fit. Is that ideal? Perhaps not, and commercial entities may also exist to make the process easier, as things continue.

You also have to remember, we’re early days here, and not everything we want is going to be ready at launch. For a small company, the milestones achieved are significant. And these problems will eventually be solved.

kevink · October 28, 2019, 6:55pm

Don’t get me wrong, STORJ is great and it is designed as a different approach to datacenters. And in that way having a company run a satellite is just fine.

Only when it comes to the goal of storing data decentralized to preserve it “for eternity” so that it will always be accssible for everyone, then the concept of STORJ is unsuitable because you have the single point of failure which is the satellite operator. Nobody can say if the organization or individual behind/in control of a satellite will still run that satellite in 20 years while a truly decentralized network could easily survive 100 years if enough participants stay.

Vadim · October 28, 2019, 6:56pm

Does satelits stor only own information or they sync all databases?
Beause if Satelite hold only own information this is botle neck, and if satelite die, all data will die with it, as will be no access to it.

Knowledge · October 28, 2019, 7:47pm

It depends on how you view it. Let’s say I am using Microsoft Azure. If I want to store data on their servers with local redundancy, I pay a low fee. If I want to store data with zone or geographic redundancy, I pay larger fees. Storj can do this as well. You can store your data on one or more satellites. It just depends on your cost factors and your need to have data be redundant.

This, of course, would require tools to make this kind of redundancy happen, otherwise you would have to upload the files multiple times to different satellites. Ideally, businesses that leverage Storj will create such tools.

kevink · October 28, 2019, 9:26pm

That’s true. There are possibilities to store data even more redundant so it doesn’t depend on one satellite. A little besides the point of having a decentralized network that is already storing all the data safely and redundantly but the possibility is there.

Vadim · October 29, 2019, 10:32pm

StorJ have redundensy on data, but do you have redundensy on Satelite?
if satelite server raid go creasy it will wipe out all data include operation system, i seen that hapened.
Backup will save situation but not fuly as backups are once a day or twice, bud data between are lot of gigabytes.

keleffew · October 29, 2019, 11:53pm

The satellite peer class is designed to perform more intensive operations (like audit, data repair, node reputation, and billing & payments. The peer class is designed to be run over a cluster of high availability servers and distributed across many regions. The only single point of failure is that you have to trust the entity who runs them (however you are always open to run one yourself).

As mentioned, anyone can run a Satellite on the Storj network. Providers might want to operate Satellite as a service, or a company may want to run their own Satellite to build their own decentralized cloud storage network.There are likely many other reasons someone may choose to operate their own Satellite. Due to this, we expect there to be a large variation in quality.

Over time, the idea is to offload these functions into the network itself, or to decentralized compute networks over time as such solutions emerge. We are working on releasing a roadmap that details the pathway to decentralizing the core functions of the satellite.

As defined in whitepaper, the Satellite instance is made up of these components:
• A full node discovery cache (section 4.6)
• A per-object metadata database indexed by encrypted path (section 4.9)
• An account management and authorization system (section 4.12)
• A storage node reputation, statistics, and auditing system (section 4.13)
• A data repair service (section 4.14)
• A storage node payment service (section 4.16)

Our primary focus is building a permissionless, decentralized network that competes with Amazon S3 on Price, Performance, and Security.

The satellite trust boundary enables the network to achieve millisecond response time amongst a globally distributed network of untrusted storage nodes. Storage nodes may connect to multiple satellites and must trust their operators for gathering paid demand, and performing billing and payment function

Alexey · October 30, 2019, 8:00am

If satellite operator uses the cloud services to run a satellite, (s)he can configure satellite services as a docker containers and run them in the Kubernetes or docker Swarm.

The DB service usually rented from the cloud provider as well. In the last case you have snapshots as for a whole service and for the database. So, you can recover the DB almost on the same time as time of failure. But usually the DB service is configured with at least a standby replica, so there is no downtime at all and no data lost. Probability to get down both replica in a different access zones is very low.

This is just a best practice to operate a high load production services. I doubt that Storj Labs could not follow it.

jocelyn · February 24, 2020, 8:52pm

its alive!