Script to check api stats on public/private storj servers

Hello, just sharing a script I modified from @ kevink which I use to check my node’s health using the dashboard API, it can be used with local and public servers which have the dashboard api exposed.

https://github.com/xyphos10/StorjScripts

image

9 Likes

Looks great! Thanks. Way nicer than my script :smiley:

nice work! I will add it as a task to add to track in the grafana dashboard

Nice work :+1:
I just noticed, your node has been paused on one satellite.
Please, send a support ticket to the support@storj.io with your NodeID and full docker run command

@kevink @xyphos10 great script! I created a PR to add the uptime percentage and make it default to 127.0.0.1 and 14002 if no parameters are given, since most people will likely run it on the nodes machine anyway.

These numbers are not correct!

The score is alpha / (alpha + beta). It has nothing to do with the total number of audits. It works more like a moving average. You might see something like 980/1000 total audits but a very bad score because of 20 failed audits at the end or a very good score because the 20 failed audits happend a long time ago. -> The total numbers are missleading.

Okay I will revise as needed, the overall score that is shown is not a calculated value but rather gotten from the API. Would it make more sense to just show something like

Audits Health:
-Total Audith: xxxx
-Successful Audits: xxx
-Overall Score: xxx

Just making sure, you’re saying they are misleading, not that the actual numbers are wrong? I agree that the score is all that matters, but I’m still interested in the totals as well. I didn’t see any factual issue in the calculation but I may have overlooked something.

I would expect confusion like 980/1000 = 98% but the score is showing less than 60%. Maybe we can display these numbers in a way that explains it better.

1 Like

In that case why not just use the current score/maximum possible score as a percentage? Is that possible, what is the maximum score?

The score is already kind of a percentage, it’s between 0 and 1.

How about total audits without any percentage or score and than the raw alpha and beta values and the score. We could call that recent audit results. It doesn’t explain how alpha, beta and the score is calculated but it should at least explain that they depend on the recent audit results.

Best results would be:
Beta close to 0
Alpha close to 100 for audits and 20 for uptime
Score = alpha / (alpha + beta)

Okay, you mentioned that hightest alpha values are 100 audits and 20 uptime, do those numbers start at 0 and then move up to the max or like in V2, start at a high number and then reduce? Sorry for the question, just want to understand the data more.

https://github.com/storj/storj/blob/86f4b41a70d6ad69c0220b6d82a5c91dee7e08b5/docs/datascience/extending%20ratios%20to%20reputation.pdf
I think this describes the current situation, but with everything being developed on it might be forward looking as well.

That said, I don’t know if the individual alpha and beta values will mean much to anyone unless they read and understood these documents. And for most SNO’s that’s just major overkill.

How about something like this?
image
I can make another PR, if you want. But I removed some of the new lines and *** lines to make it more compact.

1 Like

I don’t see the first pull request on github and wow wish I had those scores, ISP in my country sucks T-T

My bad, I must have done something weird there. Couldn’t even find my previous commit. I created a new PR with all changes as displayed above.

To be honest, I had my own issues with down time and I feel like the reputation recovers a bit too fast.

This is from uptime robot
image

I feel like that over 2 hour down time (ISP outage) should probably still impact my reputation, but apparently it does not. But I guess that’s a whole different discussion.

The current system tracks number of pings but not the time between them. It is very unfair and therefore we don’t use it right now. We are working on a better system to detect offline time and hopefully that should work better.

1 Like

oh doh… I knew that, ignore my comment.
For others, discussion about that here: Design draft: New way to measure SN uptimes

I think for the typical SNO it would be enough to display the score so they know if they are good, have to do better or are paused.