Script to check api stats on public/private storj servers

xyphos10 · September 9, 2019, 4:59pm

Hello, just sharing a script I modified from @ kevink which I use to check my node’s health using the dashboard API, it can be used with local and public servers which have the dashboard api exposed.

https://github.com/xyphos10/StorjScripts

kevink · September 9, 2019, 9:11pm

Looks great! Thanks. Way nicer than my script

KernelPanick · September 9, 2019, 11:38pm

nice work! I will add it as a task to add to track in the grafana dashboard

Alexey · September 10, 2019, 12:55am

Nice work
I just noticed, your node has been paused on one satellite.
Please, send a support ticket to the support@storj.io with your NodeID and full docker run command

BrightSilence · September 10, 2019, 8:25am

@kevink @xyphos10 great script! I created a PR to add the uptime percentage and make it default to 127.0.0.1 and 14002 if no parameters are given, since most people will likely run it on the nodes machine anyway.

littleskunk · September 10, 2019, 11:50am

These numbers are not correct!

The score is alpha / (alpha + beta). It has nothing to do with the total number of audits. It works more like a moving average. You might see something like 980/1000 total audits but a very bad score because of 20 failed audits at the end or a very good score because the 20 failed audits happend a long time ago. -> The total numbers are missleading.

xyphos10 · September 10, 2019, 12:10pm

Okay I will revise as needed, the overall score that is shown is not a calculated value but rather gotten from the API. Would it make more sense to just show something like

Audits Health:
-Total Audith: xxxx
-Successful Audits: xxx
-Overall Score: xxx

BrightSilence · September 10, 2019, 12:27pm

Just making sure, you’re saying they are misleading, not that the actual numbers are wrong? I agree that the score is all that matters, but I’m still interested in the totals as well. I didn’t see any factual issue in the calculation but I may have overlooked something.

littleskunk · September 10, 2019, 12:29pm

I would expect confusion like 980/1000 = 98% but the score is showing less than 60%. Maybe we can display these numbers in a way that explains it better.

xyphos10 · September 10, 2019, 12:31pm

In that case why not just use the current score/maximum possible score as a percentage? Is that possible, what is the maximum score?

BrightSilence · September 10, 2019, 12:33pm

The score is already kind of a percentage, it’s between 0 and 1.

littleskunk · September 10, 2019, 12:38pm

How about total audits without any percentage or score and than the raw alpha and beta values and the score. We could call that recent audit results. It doesn’t explain how alpha, beta and the score is calculated but it should at least explain that they depend on the recent audit results.

Best results would be:
Beta close to 0
Alpha close to 100 for audits and 20 for uptime
Score = alpha / (alpha + beta)

xyphos10 · September 10, 2019, 12:48pm

Okay, you mentioned that hightest alpha values are 100 audits and 20 uptime, do those numbers start at 0 and then move up to the max or like in V2, start at a high number and then reduce? Sorry for the question, just want to understand the data more.

BrightSilence · September 10, 2019, 1:11pm

https://github.com/storj/storj/blob/86f4b41a70d6ad69c0220b6d82a5c91dee7e08b5/docs/datascience/extending%20ratios%20to%20reputation.pdf
I think this describes the current situation, but with everything being developed on it might be forward looking as well.

That said, I don’t know if the individual alpha and beta values will mean much to anyone unless they read and understood these documents. And for most SNO’s that’s just major overkill.

How about something like this?

I can make another PR, if you want. But I removed some of the new lines and *** lines to make it more compact.

xyphos10 · September 10, 2019, 1:14pm

I don’t see the first pull request on github and wow wish I had those scores, ISP in my country sucks T-T

BrightSilence · September 10, 2019, 1:20pm

My bad, I must have done something weird there. Couldn’t even find my previous commit. I created a new PR with all changes as displayed above.

BrightSilence · September 10, 2019, 1:34pm

To be honest, I had my own issues with down time and I feel like the reputation recovers a bit too fast.

This is from uptime robot

I feel like that over 2 hour down time (ISP outage) should probably still impact my reputation, but apparently it does not. But I guess that’s a whole different discussion.

littleskunk · September 10, 2019, 1:57pm

The current system tracks number of pings but not the time between them. It is very unfair and therefore we don’t use it right now. We are working on a better system to detect offline time and hopefully that should work better.

BrightSilence · September 10, 2019, 2:01pm

oh doh… I knew that, ignore my comment.
For others, discussion about that here: Design draft: New way to measure SN uptimes

kevink · September 10, 2019, 2:27pm

I think for the typical SNO it would be enough to display the score so they know if they are good, have to do better or are paused.