Blueprint: Downtime Disqualification

I like the approach outlined in this doc, but I feel like the suspended state may take too long if you wait out the entire grace period + tracking period.

This would result in:

  • higher repair costs incurred (as unhealthy pieces don’t count towards the repair threshold)
  • higher chance of node churn, because SNOs may think it’s not working anymore
  • longer peiods of no ingress for SNOs

In short, it’s bad for both sides.

It seems to me the problem with taking nodes out of suspension early is not so much the possibility of going in and out of suspension, but rather the resetting of the monitoring time frame. So I would suggest taking nodes out of suspension as soon as their downtime drops below the maximum allowed, but not resetting that monitoring window and sticking with the grace period + tracking period starting when the node was first suspended. During this time the node would be reinstated while its downtime is below the threshold and it could receive new data and pieces could be marked healthy. But it would be in a “under review” state. If it goes in and out of suspension a few times, it would not matter too much. When the grace period + tracking period since the initial suspension expires, the decision would be made to reinstate the node completely (no longer under review) or disqualify. This would limit the time the node is effectively not taking part in the network and reduce repair needed as a result of that.

2 Likes