After re-reviewing this whole thread, I’ve re-run the simulations with 2 significant changes:
- It’s fine to start with an alpha >= 1/(1-lambda).
- It’s fine to reset all node reputations as a one-time change.
- An added point of evaluation is “how long does it take to get from a perfect score to a DQ when (apparent) data loss is suddenly 100%?” We want this to be larger than 10 audits, but probably less than hundreds of audits.
Given those, I don’t see any set of parameters that does significantly better than @BrightSilence’s:
- DQ threshold=0.96
- initial alpha=1000
With those parameters, it takes around 40 consecutive failed audits to go from a perfect score to a DQ. On a busy node, that still is not a very significant amount of time, but it is at least several times larger than what it was. If we have the writeability check timeout added as well, this seems like it could be an acceptable situation.
The grace period I was suggesting (no disqualifications before 50 audits have been completed) no longer makes a significant difference with this high initial alpha, so I’m taking that out.
So the new change plan is:
- Change lambda to 0.999
- Use alpha=1000, beta=0 for new nodes
- Reset all node reputations to alpha=1000, beta=0
- Change the DQ threshold to 0.96
I think we can even do all of these at the same time. What does everyone think?