New audit scoring is live

As long as they are not generated in order that should still make the order random though. I can’t really explain the clustering in the graphs then though. Unless these errors aren’t caused by missing data, but by slowdown on the node itself. But in this case I believe missing data was what caused the errors.

I think @SGC meant that it didn’t require 40% data loss to be disqualified before. Personally I would consider that part of the problem with the old system. The threshold had to be that low because of the erratic score behavior, not because 40% data loss was actually allowed to survive. It became way too much a game of chance rather than a clear cut off of what’s acceptable. The new scores still have a small margin of chance but it dropped from a 20% range to a 2% range (roughly). So while the old system suggested that up to 40% loss was acceptable, anything beyond 5% loss could in some cases lead to disqualification after a long time. The new system may suggest 4% loss is ok, but anything beyond 2% loss is dangerous. That seems much more reasonable to me.

In the end I think the reasons were clearly outlined in the linked topic and in my summary here. So @SGC have a look at that summary and if something is still not clear as to why this was chosen, maybe you can be more specific?

sorry i might have missed it… but i just wanted to clarify… is DQ enforced now?.. does that mean when a node is disqualified moving forward, it is now permanent?

  • i recall currently there is a leeway for a node to resolve their issue within a 7 days turn around?
  • if a node’s underlying harddisk is starting to fail, but have yet to migrate the node data over to a new device… they will be DQed immediately?

lastly… is there a means for a node to quickly go offline once the failed audits suddenly starts increasing? as i’m in the understanding that an offline node can be rescued and recovered with a longer leeway?
e.g… if the threshold is at 96% audit, can i configure it to go offline at 98%?

these are home run machines so the operations/support attention is minimal and not dedicated… so just wanted to see from a node perspective, what are the options available?


i’m not talking about the DQ / data loss threshold %
but the actual audit score.

before it would DQ at 60% just like suspensions happen at 60% suspension score, or online suspensions happen at 60% online score.

now it will DQ at 96% audit score, which might seem a bit confusing.
personally i don’t mind that it actually shows the real world numbers instead of some rather arbitrary number…

but i think it would be less confusing if all the scores was bad at the same level, atleast for those with zero knowledge looking at it…

so what i propose is that the 96% audit score becomes 60%, ofc that is also rather confusing, which is why i suggested the whole concept of how to easily read it might need to be revamped.

from what i understand / remember the reason everything was going by the 60% concept was that online score is over 1 month / 30 days and 288 hours of downtime is 12 days which is 40% of 30 days.

so for ease of use that 60% score being bad just to imposed on the audit and suspension scores.

in short, it needs to be easy to read and learn the dashboard readout.
having multiple different thresholds makes it more difficult… but i must admit i kinda like it being the real world numbers.

maybe there are other ways to make it easy to read and simple to learn, the goal imho being the dashboard being readable with minimal knowledge.

It is always permanent.

for suspension, not disqualification. There is no suspension before disqualification and never been. The suspension could happen:

  • the suspension score fall below 60%;
  • the online score fall below 60%.

The disqualification is now happen, if audit score fall below 96%.

depends. If it lost more than 4% of data - it will be disqualified.

The current version of storagenode does not have this feature. But basically - yes, it could give you more time to fix an issue and have a deal with consequences of low online score later:

  • if your node will be offline more than 30 days it will be disqualified anyway;
  • while your node is offline it losing data (offline data considered as unhealthy and can be recovered to other nodes, when you bring your node online, the garbage collector will remove this data from your node. Longer you offline - more data will be moved from your node to a more reliable ones).
1 Like

But the score literally is a rolling estimate of the data retention percentage (the inverse of data loss). These aren’t two separate things. It’s just that that estimate used to have very little memory of previous audits, so it used to be all over the place. Now the estimate has more memory and is more accurate and so the threshold can also be a lot closer (and has to be) to the actual acceptable data loss.

I don’t really get this, because 60% is even more arbitrary than 96% because it has less meaning. Maybe take a look at @CutieePie 's suggestion a few posts up, which happens to also be how reputation scores work in my earnings estimator. It’s basically replacing the actual percentages of data retention with 0% = all good, 100% = disqualified. That gets rid of all arbitrary numbers and replaces them with something everyone can understand intuitively.

I agree with your goal of making it easier to understand, but not this proposed solution.

Is this actually correct? I was under the impression that the system switched over for both scores, since they both use the beta model.

thanks for the clarification!
could we think about including this feature in future releases?

noted that when a node is offline it starts losing data, but a few days is probably what someone need in order to replace/migrate the node… would do good to longstanding nodes maybe?.. besides… i think the point is most of these machines are running at home while we are at work… and not everyone practices remote monitoring/management of the nodes so an easy config helps alot

on the note of 60% and 96%… why don’t we add finer details to the screen?
like additional details in a hidden layer that can be expanded, detail the following:

  • how many failed audits in the last 1000 rolling - e.g 10
  • whats the threshold measured against - for now 40

and of course other details that one may think that helps

All three scores are independent, but we can ask @thepaul


yeah i totally agree the 60% limit is very arbitrary, the 0 to 100% is a very good idea…
totally forgot about that…
i think we should push that as a feature request, but since it’s CP’s idea i don’t want to take the credit.
@CutieePie maybe you want to call for a vote on this?

We’ve been talking about this for quite a long time. Over 2 years ago I posted this mockup as a suggestion. But I guess I never posted a suggestion in the appropriate area.


However, this suggestion seemed to get at the same thing. But the wording of the suggestion was really unclear and the conversation died down quickly. Asking for more clarification and showing my suggestion didn’t get a further response. My Suspension score 100%


yeah that is basically the same concept, just with a bit more graphical details.
may have glanced at it when you first mentioned it… still a good idea.

0 - 100% would do the same thing, and only modification would be to a couple of variables somewhere in the dashboard or storagenode api backend i guess.

another thing we have to keep in mind is that StorjLabs most likely has plenty of stuff to do, so if it isn’t straight forward and there is not much benefit, they might disregard good ideas.

who ever makes the suggestion and whatever the best solution end up being, it has my vote.
for now i would think the 0 - 100% has the best odds of being implemented and would do the job just fine.

non-ICE does make a valid point that its odd that suspension score is 100%
sort of sounds wrong or gives the wrong idea…
ofc setting that to 0% instead of 100% makes it also look kinda off…

so maybe suspension score could be renamed or something…
audit score 100% and online score 100% makes good sense tho…

i suppose like non-ICE says, it doesn’t need to be called suspension score… just like audit score isn’t called DQ score.

maybe Health Score would work…
suspension score is a measure of faults or errors created by the node…
which is sort of a node health indicator…

might get confused with audit score… since it do sort of remind of SMART health for HDD or such.

so maybe another word… but something like health which most people would understand.
integrity score maybe… but doubt thats nearly as well understood a word and again it could be misinterpreted to have something to do with HDD / storage

coherency … might work… but then that is most likely a more obscure word than integrity with limited english skills.

but a node throwing errors is essentially incoherent and it isn’t because of the data being bad…

yeah… i kind like that… coherence score

but i guess, system score, or consistency score might also work.

this is weird…
never actually had an unexpected audit drop before…
could this be some sort of rounding error or so such thing?

kinda doubt i actually failed an audit…
got two nodes showing this and both on ap1 only

can’t exactly rule out it being a hiccup with my storage, have had some issues during a recent replacement of some disks, but i believe zfs did manage to resolve the inconsistencies, which is sort of expected as i run with redundancy.

anyways, wanted to mention it, incase somebody else is seeing something similar.

it can be some restarts downtime for update, also after update, it maks space calculaton and can drop any repear due to high I/O

That’s right, we did not change the “unknown audit” score model. It doesn’t have inputs coming in at anywhere near the same rate, and there isn’t any particular mathematical reason why it should operate in lockstep with the standard audit score, so it deserves its own analysis and research done before we change that part.


Well, I do agree with @BrightSilence: this matter has been discussed in the past, a few people suggested several great improvements for how scores could work and be displayed, but it never got much attention. I’m not sure why :confused:

The link I already posted in this thread ~10 days ago is not the only one where the subject was discussed, but it is votable and does gather many interesting ideas, IMHO.

If someone else has a better way of expressing those ideas so they get more traction, please be my guest, I’ll be happy to vote and support them! :slight_smile:

1 Like

I think it would be quickly if someone from the Community will make a pull request regarding this.

1 Like

I don’t agree Alexey (not with you, I appreciate you probably didn’t write this :smiley: ), the Github Contributor License Agreement (CLA) is a bit heavy and restrictive, and the information collected ? would be nice to be a bit more transparent ? why do you need someone’s home address, and then bothering to follow this up as in this PR web/storagenode: fix bandwidth graph by CatInAHatIsBack · Pull Request #5020 · storj/storj · GitHub - Is this so you know where to send the free “I :heart: Storj” T-shirt ?

Also, for completeness The Developer Certificate of Origin is a great alternative to a CLA

It really not encouraging creative ideas when the PR goes in, and then you have to agree to and provide so much information, I’m not sure people like this, and it could be a blocker given what you are asking people to agree to ? It’s more normal just to have the CLA, and you agree or decline it at the bottom…?

I think BrightSilence can take the Credit :slight_smile: feel free to log it as an idea SGC and claim the glory.



Hah, i like how we pass credits around. But I’m gonna just mention @Pac should get credits.

I think the reality is that several people in this community came up with the same idea independently over the years. I guess that’s probably a good signal that it’s a pretty good idea. Let’s just share the credits for it as a community. @Pac 's suggestion linked earlier seems the most complete and already has a few votes. So I suggest we all just vote on that one. Dashboard UX: Percentages


VOTED @ Dashboard UX: Percentages

1 Like

yes :slight_smile: and maybe more, but I will ask the team.