Question about Audit Score

Hello!
If I remember and read correctly a node will get disqualified if the Audit score goes under 96%. But why are my Audit scores just yellow and not Red if my node is just a bit away from disqualification? Or am I wrong there and a node will just get suspended at 96%? And at which score is the Node disqualified then?

Thank you in advance
Marvi

I think the idea is it’s red when you’re already below the threshold set in code.

that being said I think the “online” coloring doesn’t match storj’s actual thresholds

But also… do you know why your audit scores dropped? That’s usually bad and means either missing or damaged data.

1 Like

But Online Score is Red waay before you get Disqualified. Why not Audit? I know the system got changed, but not maybe not the threshold? I know that, because the Storagenode Website (that on port 14002), Changes “All Healthy” to False instead of True and triggers my Monitoring, but not the Audit Score.

And Yes, my HDD is failing. Thus the dropping Adit Score. My Monitoring shows this too:


(Red= Failed, Green = Success)

Do you have an Asus router with those Trendmicro addons on? Like virus protection and co? Those are bad for storagenodes.

I have an Asus Router yes, but everything is disabled there. And the Storage node is over a vpn cause of cgnat. It is definitely my HDD failing, because the other two are fine. I am just curious about the Audit score dropping and not sending any alarm

It doesn’t send any alarm, while it above 96%, it’s yellow, as soon as lower or equal - the node is gone.

But why not set “All Healthy” to false if Audit is below like 99 or 98%? Online Score sets it to false way before disqualification too. I think it would be beneficial because a node operator (that has active monitoring) will be able to repair the node before disqualification then.

2 Likes

Because “All Healthy” is not a long-term indicator, but a time-of-check one. If the node has recovered from issues that resulted in failed audits in the past, you would like to verify that fact, and the “All Healthy” status item is exactly this. If you want a long-term indicator, that’s the scores themselves.

1 Like

Okay, but what brings the all healthy if it triggers only if my node get disqualified? Then it’s useless. The online score works fine. It warnes me, before my node gets suspended and I can fix the issue, but not Audit

No, a node does not have to be disqualified to not have an “All Healthy” status. If you lose connection to your ISP, you will be disqualified after 30 days, but you will get the “not healthy” signal likely within minutes. Again, time-of-check vs. long-term.

1 Like

No, you get the “All Healthy: False” if the online score drops below a set percentage (I think it was something like 94% or so). I don’t mean the connection timeout, that would happen if the node goes offline. Because if the node gets into red area, the site on ::28967 changes it’s text from “all healthy: true” to “all healthy: false” and in the same time the http code changes from 200 “all ok” to 500 “internal Server error” (or was it 503 "Service unavailable?:thinking:)

EDIT: I mean the site on port :28967 not :14002 the dashboard

I have this Site integrated into Monitoring:


And it changes HTTP status acording to Node Status, if the scores are OK or in Red area:

By the code, the AllHealthy condition is set to false in the following cases:

  • Node is disqualified at any satellite.
  • Node is suspended at any satellite.
  • Node has an online score of less than 0.9 at any satellite.
  • Node did not connect to any satellite.

So I stand corrected, the AllHealthy status is not just set to false because of short-term problems, but also some long-term. Which is weird, but the case I remembered (lack of connectivity) is there.

Satellite ping is by default every hour, so after at most an hour you would see AllHeatlhy set to false.

2 Likes

Thank you for providing the Code!
It would be nice to Add there the other two stats into the AllHealty status. Cause if for example pieces are failing and making a huge inpact, the node is in fact not healty. The same would make for suspense sense too. So why not add those two into the factor and just online Score and if its already to late.

Connection Problems are easy to monitor:
First you get Emails about the Offline node
and Monitoring (if you have any) would report connection lost.

But why not Suspension and Audit? Those two would go unnoticed until its too late

1 Like

TBH, I find the status of Storj node monitoring signals very messy. I wrote some code to collect signals from 5 endpoints to get a good-enough picture of node state, it would indeed be nice to have one good place to see everything.

I shared your concerns with the team.

1 Like