Tuning audit scoring

thepaul · February 9, 2022, 4:38pm

Coming back to this!

It’s been a long time and this has been through a lot of internal discussion. One of the first things I found was that it would be best for us to continue using the Beta Reputation Model as described here: https://www.cc.gatech.edu/fac/Charles.Isbell/classes/reading/papers/josang/JI2002-Bled.pdf . See also these papers describing how we are applying that model: Reputation Scoring Framework and Extending Ratios to Reputation. The benefit of sticking to this model, as explained to me by one of our data scientists, is that there is a solid mathematical underpinning to the reputations: a node’s reputation score is the chance that a random audit on that node will succeed, treating recent history as more relevant than distant history. This underpinning makes it easier to evaluate how well our parameters fit real life, and allows for the evolution of reputation scores to be included more simply in larger mathematical models. Based on that, it was probably no longer an option to use a different adjustment value for v on audit success versus failure.

Another feature of BrightSilence’s model was using a much larger initial value for alpha (1/(1-lambda) = 1000), rather than 1. This does have the effect of smoothing things out very considerably, but adopting this change would have been very difficult to apply to existing reputations.

I did some experimenting with other parameter sets, and decided that the main things we want to tune for are “how likely is disqualification for a node with acceptably low data loss” and “how likely is disqualification for a node with unacceptably high data loss”. To be fair, we want the former to be as low as feasibly possible, and the latter to be as high as feasibly possible. For the purposes of my calculations, I’ve used 2% data loss as “acceptably low” (this might be overly generous) and 4% data loss as “unacceptably high”.

I made a simulation tool to try out different sets of parameters within the Beta Reputation Model framework. You can find that as a Python script here, and some accumulated output from that script (as a CSV) here. Once we determined that we needed to keep the adjustments to alpha and beta as 1 and -1, I did some further investigation on the parameter sets that looked like the best fit, giving these final results: results of data loss versus DQ chance sim.csv. The last parameter set in that output is the one I like the best:

grace period = 50 audits (nodes won’t be disqualified during the first 50 audits)
lambda = 0.987 (the “forgetting factor”; raised from 0.95)
DQ threshold = 0.89 (raised from 0.6)

With these parameters, a node with 2% data loss or less has a 0% chance of disqualification within 10,000 rounds. A node with 4% data loss has a 25% chance of DQ, and a node with 5% data loss has an 80% chance of DQ.

Compare that to the existing parameters, where (because of the overly large swings described by @BrightSilence above) a node with 2% data loss has a 1.6% chance of DQ, and (because of the overly low DQ threshold) a node with 5% data loss has a 4.3% chance of DQ.

Therefore, what I propose now is making those changes (no DQ’s until 50 audits have completed, lambda = 0.987, and DQ threshold = 0.89). We’d make the lambda and grace period changes first, then wait for that to have an effect on scores before raising the DQ threshold.

Thoughts?