Combining SNO with Satellite Node Operator

kalloritis · October 19, 2020, 8:40pm

Right- Another scaleout object store I’m familiar with, Ceph, accomplishes this by having Monitors that keep a copy of the logic (CRUSH map) to consistently find where a piece should be, algorithmically, to “get” from/“put” to.

Storj calls them Sats, Ceph calls them monitors, but someone has to keep track of how to find all the chunks when a client wants them and where they need to go when a client wants to store them, basically. I could see a case to be made for the Sat, maybe in a super scaled down version, colocated within the SNO docker container so that each SNO acts as part of the SAT cluster themselves… the biggest issue I see though is latency/Time to converge and because of that- split-brain. eg, what happens if the west coast of the US is taking forever to confirm pieces for Euro-W because of trans-Atlantic cable issues, but all the SNO’s colocated SAT processes on the west coast and east coast of the US have confirmed it with themselves but just not back to Europe… Who’s “right” in saying the data is fully durable? Maybe Storj could spend some time with the guys over at Ceph and learn how some of these things are handled and then replicate some of it into Storj’s codebase. Maybe 2.x would have some fun features like that.

Long & short though- Without SATs, in some form, the whole thing just doesn’t work. Just like in a Ceph cluster… Lose all your Mon’s, and you’re going to have a bad day/week/month/year.

Alexey · October 19, 2020, 8:53pm

You are slightly underestimating the satellite requirements. It contains not only metadata, but also an audit system, payment processing and a system for recovering lost data.
This is a cluster of servers, not one container.
Also, the SLA requirements are high - you should provide 9 nines to the customers and be able to pay to storagenodes.

kalloritis · October 20, 2020, 12:56pm

I’ll take another read of the whitepaper to refresh my memory- but is there not already a quorum group within the SAT clustered servers to establish many of these processes? eg, no one specific server always performs task A and another performs task B- they’re shared amongst them. I would be interested in expanding that out to see if it could operate in a geo-distributed manner as well.

SGC · October 20, 2020, 4:44pm

so multiple SNO’s could make a cluster to run their own satellite or whatever…
lets face it the hardware issue is just a coordination issue, without a doubt the SNO’s are the ones with the most Processing power, the most bandwidth… so the hardware thing is just a matter of how many SNO’s it takes to run a cluster that is a satellite…

ofc this wouldn’t be on a cluster of RPI’s… just so we are perfectly clear lol

haven’t there also been talk about something like this before… it was my understanding that it would become possible to run our own satellites… even if we might not have the gear to do so…

Alexey · October 20, 2020, 7:23pm

Yes, it’s better to re-read the whitepaper. We explicitly refused to use any consensus-dependency,
see A. Distributed consensus, so there is no any quorum or other consensus mechanism implemented.
In other words - all satellites are independent on each other. Even for the one owner.

Alexey · October 20, 2020, 8:08pm

It’s called SONM, also Akash Network.
The only problem - they insecure. I’m as a provider of resources is able to see what customer run on my hardware, moreover - I can interfere, interrupt or shutdown it right away.

SGC · October 20, 2020, 8:29pm

not exactly what i had in mind, but very similar i suppose…
Supercomputer Organized by Network Mining.

i was just thinking of something a lot more simple, like having a few SNO’s band together to run a satellite by using something like a proxmox or zen cluster, with something like zen, one would be able to monitor what is running on the cluster, ofc it would be required that those with access to it had to be trusted… but that’s more of a selection issue.

but i’m sure the types of network you suggest might be viable for it from a performance perspective atleast… but i’m not sure how efficient a supercomputer over the internet would be…
usually to synchronize and distribute loads for processing, requires a fair amount of bandwidth…
which is usually a big problem with building supercomputers… to my understanding atleast…
a cluster would imo make much more sense, both from it being mature software and and the processing load tho distributed to a certain degree would mostly be for redundancy purposes… not sure how the satellite software works, but i assume it cannot be split over multiple vm’s or containers and thus will need to be hosted one each system… ofc each system would be part of the cluster and data would be balanced between them while the live sync i believe… but i know very little of clusters.

not sure proxmox can even run clusters at that level either, so it would have to be esxi or zen, but not fully sure they can run it like i think they can…

i know i for one would both like to run a cluster and run a satellite
i’m still quite a bit out from running a good cluster setup…

Alexey · October 21, 2020, 7:37am

You just described Akash

SGC · October 21, 2020, 7:39am

That been called a cluster for longer than Akash have existed…

Alexey · October 21, 2020, 7:42am

Yes, but not distributed, as you suggest.

If serious, I think that you can just run a satellite, but it should be provisioned differently from storagenode. However, you can try to run them both, but you will need a HA setup with Kubernetes or Docker swarm cluster and Ceph under the hood. Don’t forget about HA for networking and power supply.

SGC · October 21, 2020, 7:51am

clusters can distribute load across multiple data centers, atleast to my understanding when using zen
yeah, well i would need a new split psu for my server to be fully redundant

but that was kinda the idea of doing the cluster, that would make it so no matter what goes wrong locally another “SNO” participating in the Cluster would take over.
and thus the satellite would be run from multiple geographical locations, which would provide power stability, hardware stability and all that…

sure it may require some pretty hardcore server, but still how bad can it be… its just iops mostly right…
thats exactly the kind of thing my system is good at, even if very old hardware.

maybe a lot of RAM would also be good… i can only go to 288gb not sure whats worth it without upgrading the rest of the server

Alexey · October 21, 2020, 7:56am

The huge problem is database server for metadata, it should be fast and fast even distributed. I do not know any database server which could work fast having a high latency connections to storage (between two continents for example), of course relatively to a local network.
Also, there is a security problem and trust problem.

Pentium100 · October 21, 2020, 7:59am

The problem is that the satellite is centralized. It does not matter how big the cluster running it is, the cluster has to belong to one operator and makes a completely separate network from all the other satellites, even if the satellites use the same nodes for storage.
It’s like running your own private torrent tracker.

But yeah, making a “distributed satellite” is not easy, which is why Filecoin and others that attempt to do so are much more complicated.

SGC · October 21, 2020, 7:59am

well the cluster solution is mostly just a copy, it can used vm’s or something to distribute loads but mainly focused on it can move vm’s from datacenter to datacenter.

else the cluster concept works by having both servers as a mirror… if one goes down the other can pick up exactly where it left off… so they don’t really run with a lot of communication between them…

they run in a lockstep type fashion

SGC · October 21, 2020, 8:00am

so maybe we just solve the problem… crowd mind style

Alexey · October 21, 2020, 8:10am

In such a fashion you always working with a leader database node, not with a several and thus a bottleneck
But the several nodes could help to distribute the load and reduce a latency.
Distribute database access services is not a problem - you can take a look on storagenode
The only problem is storage. Each database node in such configuration must have an identical data to be consistent and fast. Otherwise it would be either consistent or fast

SGC · October 21, 2020, 5:54pm

so i just have to invent Quantum Entangled Databases
xD oh wait… i think i just did… i just don’t know how to build it yet…