From a client perspective the Tardigrade network has high availability (HA) by design because a piece is spread across several nodes.
But on SNO side, the architecture is single node, without HA. If a node dies, Tardigrade customers are not affected but the SNO will be in trouble by being disqualified and loosing the amount held back. Perhaps he/she will not join the network anymore.
A HA distributed architecture on SNO side could be implemented in the following way, for example in a three-nodes scenario:
- if a node is configured in HA, for example in a three nodes configuration, the identity will be unique but the storage nodes will provide three different IP/port mappings to satellites, one for every node in the cluster
- satellites will have a pool of destinations to connect to for a specific identity: they will try the first one and if not available will try the second one and so on; no answer from any node is considered a failure
- the node receiving the command (i.e.: PUT, GET or DELETE) will execute the command on the node and then replicate the command to the other two nodes like a satellite does; this could be managed in an asynchronous way
- the satellite who sent the command knows that a node is handling the request and will not forward the same request to the other nodes, so integrity is preserved
That’s just an example, there are a lot of systems which implements a symmetric distributed HA without any point of failure by replicating the storage on backend side.
Another approach could be to implement the HA only on SNO side, leaving the satellite software untouched, by introducing a load balanced architecture on storage nodes.
This approach is not a duplicate of the HA on the client side, it’s rather a complement. Think about a typical enterprise storage environment: there are a lot of HA techniques on each architecture layer: on the app side, on the middleware side, on the server side and on the storage side with RAID. No one would decide, for example, to remove RAID redundancy on a storage box because the HA is handled by the application. A robust environment needs specific HA on every critical layer of the architecture.
If storage operators can provide a higher level of availability, tardigrade users will benefit for the increased availability and therefore Storj will benefit too. It’s a win-win-win.