Distributed architecture for SNOs for high availability of storage nodes

The replication should be on other level - storage, not the application.
We should not invent the wheel - you can use a Gluster or Cepf for that, which are robust these days and have a long history of development, it will be more reliable than introduce a replication on SN level, because we do not have an implementation of replication anymore.
Your node can only spread pieces across the network during the Graceful Exit. I don’t think you want that in case of compute failure.

For the compute failure protection you can use a Kubernetes or Docker Swarm. Those solutions are proven in the production too.

I don’t think that we should re-invent those technologies in the SN, it will be more complicated and less reliable than a separate specialized solutions.

5 Likes