Storage node dashboard

littleskunk · September 19, 2019, 4:31pm

For questions about the dashboard API please use Storage node dashboard API

Lets use this thread to talk about the new web interface. Open http://127.0.0.1:14002 in your browser and take a look. If your storage node was paused on one satellite you will get a warning about that. I think everything else should be self explaining?

Some of you might see a false history on the 118 satellite. The reason for that is that we pushed the docker image while the satellite was still running the old version. There was a small time window for getting false information from that satellite.

BrightSilence · September 19, 2019, 4:50pm

Dashboard looks great! Saves a lot of effort with scripts etc.

I did notice a few things:

Uptime checks and audits are displayed as a % over the lifetime. This is not indicative of current reputation and sudden drops in audits could cause a node to be paused even though these percentages would still look fine. Why was this chosen over uptime and audit scores?
I noticed the minimum version is displayed as v.0.0.0, is this correct?
Audit check description sounds more like an uptime check description. It doesn’t mention it actually verifies you still have the data you should be keeping safe.

Odmin · September 19, 2019, 5:08pm

Is it possible add “Deposit” or “Escrow” metric to dashboard?
I think SNO would like know how many tokens they have on deposit per sattelite.
Another proposition, add node age to dashboard and payout proportion based on age.

nerdatwork · September 19, 2019, 5:14pm

overlapping%20progress%20bar

Overlapping progress bar.
Stats could be: Remaining/total allocated
Eg: 2TB/8TB

John.A · September 19, 2019, 5:15pm

Just a fast question about -p 14002:14002.
Adding this does it open up for lan and wan?
Port is not opened in my gateway so ill guess it stays closed on wan???

Did i make any sense?

Dylan · September 19, 2019, 5:20pm

@John.A It should be -p 127.0.0.1:14002:14002

tikh · September 19, 2019, 5:25pm

Why was this chosen over uptime and audit scores?

What do you imagine this looking like? The uptime is typically displayed as a percentage, as seen in docs:

Minimum uptime (online and operational) of 99.3% per month, max total downtime of 5 hours monthly.

Makes sense to keep it consistent with uptime. What are your thoughts on that?

tikh · September 19, 2019, 5:25pm

Agreed! This is something we all want and is on the roadmap to add.

John.A · September 19, 2019, 5:26pm

It is but cant reach it över lan just local

tikh · September 19, 2019, 5:26pm

Audit check description sounds more like an uptime check description. It doesn’t mention it actually verifies you still have the data you should be keeping safe.

We need to update this. Actually, what’s your feedback on these descriptions?

Audit checks occur to make sure the data sent to a Storage Node is still held on the node and intact. This is the percentage of audit checks a storage node has passed.

Uptime checks occur to make sure a Storage Node is still online. This is the percentage of uptime checks a Storage Node has passed.

tikh · September 19, 2019, 5:27pm

Bug hehe, will be fixed

BrightSilence · September 19, 2019, 5:34pm

Those descriptions sound great

As for the percentage, I think the disconnect between life time percentages and the actual scores based on recent performance which are used by the node to determine reputation and disqualification might lead to some confusion. I would suggest taking the actual score (which is a number between 0 and 1) and multiplying it by 10 to make it more human readable. Of course then the descriptions should change a little as well.

Audit checks occur to make sure the data sent to a Storage Node is still held on the node and intact. This score is based on the recent percentage of audit checks a storage node has passed.

Uptime checks occur to make sure a Storage Node is still online. This score is based on the recent percentage of of uptime checks a Storage Node has passed.

You could choose to include thresholds in the description by saying nodes get paused when the score drops below 6.

Of course the downside of this approach is having to point SNO’s to complex beta calculation documentations if they want to know how that score is calculated.

Dylan · September 19, 2019, 5:36pm

Update: If you would like to run it on LAN only, you can remove the 127.0.0.1 in the docker run command.

John.A · September 19, 2019, 5:46pm

Awesome. Thanks alot

tikh · September 19, 2019, 7:13pm

This is an interesting idea! The down side is tough though, we definitely want this to be fairly intuitive. This is something we can work to evolve over time though.

SlavikCA · September 19, 2019, 8:28pm

what if I make dashboard exposed to WAN?
Is it bad idea? Any known issues?

Dylan · September 19, 2019, 8:35pm

Please, do not expose to WAN as there is no authentication and is a security risk, only run on your local network.

SlavikCA · September 19, 2019, 8:44pm

Yes, that’s my question: what’s the risk?
There is no confidential information I can see on the dashboard.

I can even will volunteer and expose my dashboard:
http://slavikca.myds.me:14002

Now, what worst case can happen?

xyphos10 · September 19, 2019, 8:44pm

If you want to access it via WAN one alternative is to setup apache (reverse proxy to storj api 14002) with a letsencrypt certificate and add a firewall rule to only allow access from your ip and not the world

Pentium100 · September 19, 2019, 8:57pm

I have noticed some inconsistencies
1.
If I go on the new dashboard and choose the satellite 118U, I get:
Uptime Checks 99.6%
Audit Checks 99.9%

According to this, I’m half way to getting disqualified due to uptime, but if I use the API, I get:

curl http://localhost:14002/api/satellite/118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW | jq .data.uptime
{
  "totalCount": 217696,
  "successCount": 216776,
  "alpha": 99.91436151308675,
  "beta": 0.08563848691309517,
  "score": 0.999143615130869
}

So, the new dasboard shows the 100*successCount/totalCount value and not the score? Shouldn’t it show the score too, since the score is what’s important for not getting disqualified. right?

Same with audits, it shows 99.9% success, but the API gives this:

{
  "totalCount": 49040,
  "successCount": 49002,
  "alpha": 19.99999999999995,
  "beta": 4.4e-323,
  "score": 1
}

For that same satellite (118U) it shows:

The bandwidth value is probably correct, but I really doubt the disk space value.

More interesting is this graph:

How can the GB*h value go down in a few places? Even if all data was deleted, the value should stay constant, right?