Thank you for such an awesome project and all the hard work everyone has put in this far. Based on my understanding of the network and available data I have provided my feedback below.
One of the greatest setbacks with the current method of SN uptime, and disqualification related to it, is that it does not take into account that all nodes are not equal. Factors that differentiate nodes from each other:
- Age of node
- Amount of data currently stored on the node
- Audit history reputation
- Network quality of service (is this a highly used node due to geographic location and response time?)
- Node escrow balance
While each of the above factors should not be weighted the same, they need to be taken into consideration when calculating the disqualification threshold. Setting the threshold too high does create a situation where certainty of available datasets becomes looser, while setting the threshold too low could cause extremely high node churn and could become costly for the network.
For an example, let us use “Node A” and “Node B” with configurations as follows:
-Node A-
Age: 6 months
Disk used: 20TB
Egress: 1TB
-Node B-
Age: 2 months
Disk used: 500GB
Egress: 30GB
Node A has much more at stake than Node B as they have a much greater vested interest to keep their node up “at all costs” in order to avoid becoming disqualified since Node A’s escrow balance is fuller; plus avoiding going through the full process of setting up and building ‘used disk’ on another node. There are some circumstances where availability of the node cannot be guaranteed by the operator, such as a regional network outage. Additionally, instantly disqualifying both nodes in such an instance creates a monetary loss for the network as it now needs to pay other nodes for 20.5TB of repair traffic; $205 in this situation. Unless enough time has gone by for Node A’s escrow balance to hold $200, this has now become a loss to the network.
With the above in mind, in the current model, since ingress traffic is ‘free’ (and there will be situations where free storage is provided to encourage network adoption), this sets the precedence for an attack on the network which encourages nodes to be set up while ‘free data’ is uploaded to the network and then dropping the new nodes; causing the remaining nodes to obtain monetary benefit through network repair traffic. The only workaround I can think to this solution is to prioritize ‘free storage’ traffic to older vetted nodes.
Clearly, the benefits of preserving nodes with high ingress/egress throughput rates and capacities do not need to be spelled out here. This needs to be calculated into the node disqualification threshold. (Note: throughput rate needs to be validated from the satellite side. Historical transfer information may be the best measure for total capacity. These two need to be mutually exclusive measurements.)
Since we see that there are several reasons to encourage ‘veteran’ nodes from both a network health and monetary protection standpoint I propose an uptime measurement system which has a dynamic threshold based on the above five factors. The downtime threshold should not reset from month to month, like the current system, but should have a trailing 45-day window. I would argue that, in the most extreme cases, it is reasonable to allow up to 48 hours of downtime for the most veteran nodes that have the best ‘stats’ before complete network disqualification. The following are the recommended SN uptime calculation parameters:
i) As the age of a node increases, so does the probability that the node will come back online. Greater age should increase the disqualification threshold linearly.
ii) The greater amount of node used disk means it is in the networks best interest to try and retain this node from a monetary standpoint. This should increase disqualification threshold linearly.
iii) ‘i’ should be factored against ‘ii’ to determine the cost to perform a network repair.
iv) Audit history reputation should reduce the threshold exponentially since data integrity is in question.
v) A “desired” node Mbps up/down and total bandwidth capacities desired should be determined. If node values equal the established baseline, no change factor. If below the baseline then the disqualification threshold is reduced based on how far off the baseline; if above, the threshold is increased based on how far above the baseline.
vi) Taking all of the above into account, each node should have an established monetary “downtime cost per minute” attached to it.
vii) Once a node is “down” it starts losing time from its disqualification time threshold.
viii) The remaining disqualification time is calculated on a trailing 45-day window.
ix) Once a nodes disqualification time pool has been depleted, the value from ‘vi’ should be deducted from the nodes escrow account and absorbed by the network until the escrow account balance becomes 0.
x) Once a nodes remaining disqualification time and escrow balance become zero the node is permanently disqualified.
Best regards,
Nathan