Grafana Metrics

Grafana is awesome. It would allow the storage node community to setup a grafana dashboard inclusive email alerts. The email alerts could get fired within 5 minutes after a problem was detected instead of hours later when the satellite might notice it.

Currently, there is already a grafana dashboard but it requires running a log scraper process. It would be nice if the storage node could send these metrics directly. The storage node already contains a metric endpoint. So we just need to make sure the data that the log scraper collects is also available on the storage node metrics endpoint. Maybe we can add a bit of documentation around it so that the community is able to extend the metrics endpoint every time they wish to add something to the grafana dashboard.

Oh well I would certainly appreciate an endpoint for all the information I currently scrape from the logfiles.
Even better would be a native support for a prometheus endpoint :smiley: Then we wouldn’t need to develop any exporters ourselves :wink:

I played around a bit with grafana email notifications and it works good, even though the docker installation doesn’t support sending pictures of the graph that triggers the notification… Have to play around with it a little more to see what else I can use for alerts, seems like the Stat widgets don’t support sending an alert, only the graphs do?

1 Like

I am on the beginner level and have only little knowledge how a native integration would work. I just see how ETH2 is handling this: https://github.com/stefa2k/prysm-docker-compose/blob/add38e2e7c3f6d33f3b72ca219a0676925f665d4/config/prometheus.yaml#L18-L26

As far as I understand this should also be possible with all of our binaries. We have that endpoint as well. We are currently missing the information you would need for your dashboard.

That won’t work with your binaries because they don’t expose a prometheus compatible output. That’s why we need the exporter from @greener that transforms the API output into a prometheus format.

I don’t have any idea about how to make a prometheus endpoint in go and the exporter uses python so… :man_shrugging:

I really :heartpulse: this idea! I used Grafana + telegraf for collecting metrics via API and it really was pain to parse it in telegraf. I will more than happy (really) if storagenode can send it directly to influxdb or via graphite protocol (influxdb have graphite listener too).

Also, if you need any help with this process please engage me to this process. I will more than happy to help with it.

@littleskunk here is a good examples of how it can be sent directly on other systems:

TrueNAS

ProxMox

It also supports send alerts to Telegram, I use it and it much better than email for critical alerts (but email alerts I using too)

@littleskunk there is a go librariy for prometheus:

With an example implementation: https://github.com/prometheus/client_golang/blob/b7799362e0ac323f658fb8d52c2d6df001cf272c/examples/random/main.go

I know this probably won’t be considered anytime soon since it requires engineering efforts but maybe some day in the future?
For now we have a good exporter in python to transform the API output into a prometheus endpoint that can be easily extended (sometimes the API output is just a bit weird and unexpected :smiley: )

1 Like

There are multiple community solutions that help visualise storagenode state in Grafana:

As @kevink rightly pointed out current storj binaries don’t expose anything that could be used with grafana. Some metrics are available in the node API and this exporter translates these metrics to prometheus compatible format which is pretty specific.

To get the idea of the format difference you can try spin up the exporter yourself with docker run -d --link=storagenode --name=storj-exporter -p 9651:9651 anclrii/storj-exporter:latest and then curl -s storj-exporter:9651/metrics will give you the output that storj binaries would need to expose.

If this is implemented, Prometheus would be able to scrape storagenodes directly rather then through my exporter and this would be a much better solution. Metrics could be updated along with the rest of the app, better performance etc. Thought even if storj binaries exposed prometheus compatible metrics, one would still need a prometheus server between node and grafana to scrape and store historic metrics, and also a grafana instance for dashboards.

I think it would be good if storj binaries exposed total/successful uploads/downloads and also generic error count might be useful. Currently we need to parse logs to get this metrics and it’s hard. It would be good to add these to the node API for a start so that I can translate them to prometheus and potentially exposing all metrics in prometheus compatible format would be awesome but would need more time to implement I guess.

3 Likes

I’d release a new version of the board with this info. Early on logs were all we had, but def would be excited to standardize it better.

2 Likes