Suggestion of new feature "RaceWin algorithm "

Warezzo · December 6, 2022, 6:32pm

Hi community,

this is my first post but I’m an active node operator since 3 years.
Sorry for my bad English, but I don’t speak it very well.

I have a suggestion for the next release.
Is it possible to develop and insert in the dashboard a mechanism to count and show the racewin?

I will try to explain.

Actually, i don’t know if my node is performing in the right way and i don’t know if my setup is optimal to win all the possible races and i don’t know if i’m exposing super slow storage or the type of disk i’m buying is garbage.

If is possible to develop a kind of counter based on some algorithm, all the storage operators can know immediately and discretionally if they can adjust their performance towards the customer.
I think this is useful to help anyone want to expose ever better (possible) service.
The algorithm should be exposed not to penalize the slowest storage (they are already “penalized” without winning all the race) but only to tell “hey your storage is orange, if you want waste your money installing some PB of NVME”.

I’m not sure if i have exposed the concept correctly, but what do you think is about?

Vadim · December 6, 2022, 6:56pm

As far as i know there is several scripts that can tel you how successful are your node.
So dont see any reason developers to build it.
Also if you setup your node to WARN log mode, then you will se only warnings and errors in logs, and then you can suppose how much rases you not wined on download or upload.
So it is all simple.

Warezzo · December 6, 2022, 7:05pm

Hi Vadim, thanks for your feedback.
So, why i need to launch some script when there is a simple dashboard?

Just to understand, WARN and ERROR are my racewin indicator?
Or are just another paramether/indicator of some misconfiguration or anomalies?

Pentium100 · December 6, 2022, 7:42pm

Logs have uploads and downloads and their results
Every upload begins with a line that says upload started and ends with a line that says uploaded or a line that says upload canceled or upload failed.
To get the success rate, count the lines that say uploaded and divide the number by the number of lines that say upload started.

The same applies to downloads, in that case the lines say download started etc.

To further break it down you can group the log entries by what the action is: GET, GET_REPAIR, GET_AUDIT, PUT, PUT_REPAIR and calculate the success rate of each (GET_AUDIT and GET_REPAIR should be 100%).

andrew2.hart · December 6, 2022, 8:30pm

This is a great idea.

The only problem is the limited amount of developer time, which i hope is spent on improving S3 compatibility

Nice one

littleskunk · December 7, 2022, 11:15am

There is no need to count log lines. The debug endpoint already contains all of these numbers. I am using a grafana dashboard to display it.

Pentium100 · December 7, 2022, 11:40am

That’s interesting. I’ll have to look into it as a lot of my graphs are made from the log.

snorkel · December 7, 2022, 6:18pm

@littleskunk
Can you point us in the right direction for this grafana thing? Where can we find it and how to run it?
Is it occupying space on HDD, like logs do?
I want a deeper view on my nodes activity, but just in a dashboard/window, not in some other log that will increase over time.

littleskunk · December 7, 2022, 7:10pm

You can reduce the log level and the metrics endpoint would still keep that information.

Can you maybe just copy it out of my grafana dashboard? [Tech Preview] Email alerts with Grafana and Prometheus

I haven’t updated it in a while. Recently the audit and online score were also added to the metrics endpoint. I have added it to my dashboard but have not published it yet. Give me a few minutes to push my current version even if there are still a few changes I would like to apply.

Edit: The grafana dashboard is optional. You could also hit the metrics endpoint with your browser and search for that value. My grafana dashboard just contains the label you have to search for.