Prometheus Storj-Exporter

greener · September 26, 2019, 10:39am

I’m mostly using Prometheus to monitor stuff and was in need for storj exporter to monitor things like audit counts and uptimes per satellite, data as reported by storagenode etc…
So I made one:

Checkout the README to setup Storj-Exporter

Update exporter to latest version:

Use watchtower or update manually using the following commands:

docker pull  anclrii/storj-exporter:latest ; \
docker stop storj-exporter ; \
docker rm storj-exporter ; \
docker run -d --restart=unless-stopped --link=storagenode --name=storj-exporter -p 9651:9651 anclrii/storj-exporter:latest # or your custom options

Installing full monitoring stack

Monitoring stack requires the following components:

Storj-Exporter - collects storagenode metrics
Prometheus server - scrapes and stores metrics from Storj-Exporter
Grafana - pulls data from Prometheus and visualises it
- Storj-Exporter-dashboard - a template dashboard for Storj-Exporter metrics

Here’s how metrics flow through the stack:

Storagenode => Storj-Exporter => Prometheus server => Grafana (Storj-Exporter-dashboard)

Installation guides:

Official installation docs for Prometheus/Grafana:

Prometheus - Installation | Prometheus
Grafana - Install Grafana | Grafana documentation

F.A.Q:

Why is Summary tables (boom table) showing No data for DQ SU columns and sorting doesn’t match screenshots
- There are bugs in v1.4.0 of the plugin. Installing yesoreyeram-boomtable-panel plugin 1.3.0 resolves this.

kevink · September 26, 2019, 12:12pm

Works great so far. Thanks.
Do you have multiple nodes to see how those could be visualized together?

greener · September 26, 2019, 12:32pm

I use metrics queries like storj_sat_audit_successCount{instance=~"$node.*"}
and if panel options are set as below it will generate a panel per node.

repeat

You could probably show per satellite graphs for multiple nodes on one graph as well using the query above so that it would plot 8 graphs for 2 nodes.
I am planning to support multiple nodes on the dashboard.

greener · October 6, 2019, 11:23am

v0.2.0 update

I added some missing metrics available in SN api and refactored some repeating code. Also aggregated most metrics by adding labels to metric group instead of having separate metric for each item.

I noticed 100+ downloads on docker hub so I’m going to keep both old metrics and new aggregated metrics until at least v1.0.0 release as this would break existing dashboards. Please switch to new metrics names/labels until then.

For example:

storj_sat_audit_successCount{satellite="121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"} 66.0
storj_sat_audit_totalCount{satellite="121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"} 66.0
storj_sat_audit_beta{satellite="121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"} 0.0
storj_sat_audit_alpha{satellite="121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"} 19.0
storj_sat_audit_score{satellite="121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"} 1.0

becomes:

storj_sat_audit{satellite="121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6",type="successCount"} 66.0
storj_sat_audit{satellite="121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6",type="totalCount"} 66.0
storj_sat_audit{satellite="121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6",type="beta"} 0.0
storj_sat_audit{satellite="121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6",type="alpha"} 19.0
storj_sat_audit{satellite="121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6",type="score"} 1.0

Here are all the new metric names, I expect these will remain unchanged for as long as api exposes them:

## aggregated
storj_total_diskspace
storj_total_bandwidth
storj_sat_summary
storj_sat_audit
storj_sat_uptime

## added in 0.2.0
storj_sat_month_egress
storj_sat_month_ingress
storj_sat_day_egress
storj_sat_day_ingress
storj_sat_month_storage
storj_sat_day_storage

storj_sat_month* - metrics expose a sum of all daily records for current month per type ('repair','audit','usage' etc)
storj_sat_day* - metrics expose a values from the last daily record (current day) per type ('repair','audit','usage' etc)

To update to latest version run the following:

docker pull  anclrii/storj-exporter:latest ; \
docker stop storj-exporter ; \
docker rm storj-exporter ; \
docker run -d --restart=unless-stopped --link=storagenode --name=storj-exporter -p 9651:9651 anclrii/storj-exporter:latest # or your custom options

I’m still drafting the Grafana board, stay tuned …

Cmdrd · October 8, 2019, 3:57am

Working as expected, thanks a lot for putting this together! Hopefully Storj provides good documentation for the Storage Node API going forward to utilize for this. I tried enabling the debug port as per the town hall Q/A suggestion but the metrics being exposed are not in this type of format so were not really usable for ingestion into Prometheus, but this solves that problem.

I have a systemd service built for the standalone script and just making sure that it works properly then probably submit a PR if you want it. Pretty simple but saves some digging and testing for anyone wanting to use the standalone script.

greener · October 8, 2019, 10:17pm

I’m glad it works for you!
Systemd service sounds great, please do raise a PR.

greener · October 8, 2019, 10:27pm

I just published Grafana Storj Exporter dashboard that I use with this exporter. It took a while to figure out the units and formulas for api numbers and some bits might still be off but most items seem accurate. Do not rely on Earnings tile much as I only got 1 payout so far will confirm on next round.

Storj-exporter Grafana dashboard to visualise Storj-Exporter metrics for multiple Storj storage nodes.

combined dashboard

details dashboard

kevink · October 9, 2019, 6:06am

This is quite awesome!

I pull stats every 10 minutes and don’t get a smooth graph that way. Any way to fix this or do I really need to pull stats every minute?
grafik

kevink · October 9, 2019, 6:09am

Also there’s a slight problem if you run multiple nodes on the same host (same ip, different ports) in the details view as those get combined to one node:

greener · October 9, 2019, 6:29am

You would need to update rate intervals from [5m] to at least [10m]. I’ll try to move this to variables so can be selected for all graphs.

kevink · October 9, 2019, 6:48am

Thanks. Interestingly this only changed some graphs and even that not perfectly (10m scrape interval, 60m rate interval in graph):
grafik
Looks like I should change my scrape interval to 5 minutes and check again after a while.

greener · October 9, 2019, 6:53am

Hmm… Did you update queries for each graph?

kevink · October 9, 2019, 6:54am

Yes I replaced every occurence of [5m] with [60m]

Edit: With scrape interval 5 minutes it works correctly. With 10 minutes not even the graph in prometheus directly was smooth so maybe it has something to do with the exporter?

greener · October 9, 2019, 8:08pm

@kevink Here’s what I found - https://github.com/prometheus/prometheus/blob/81d284f806ef828e8f323088e00871fedd5a77c2/docs/querying/basics.md#staleness

It seems prometheus treats metrics > 5m old as stale by default. For prometheus v2+ you can override this interval with --query.lookback-delta=5m commandline option. That should help with the graphs but will also delay the graph showing the node stopped reporting.

In general they recommend to keep scrape interval to < 2m with default lookback-delta=5m which probably means you want to set lookback-delta=25m with 10m scrapes… I didn’t test this though and not sure if this is a good idea.

Also check this post https://stackoverflow.com/questions/56882734/grafana-dashboard-not-showing-data-when-zoomed-out

greener · October 9, 2019, 8:19pm

For your other problem try to change $node variable regex to /.*/ instead of /([^:]+):.*/
This will render ports in $node variables and should fix your issue.

The reason I strip the port is because I’m using other node-exporter and cadvisor with this same dashboard and each exporter has a different port for the same host.

kevink · October 9, 2019, 8:25pm

Oh that is interesting, thanks a lot, I didn’t think about that.
Scraping that often will make the database quite big if I want to keep my whole stats for a year… Have to check the links very carefully and think about how I want to set up my system…

kevink · October 9, 2019, 8:25pm

For your other problem try to change $node variable regex to /.*/ instead of /([^:]+):.*/
This will render ports in $node variables and should fix your issue.

Thanks, that worked perfectly.

greener · October 9, 2019, 9:54pm

Updated the grafana dashboard to match units to the recent update:

greener · October 13, 2019, 4:05pm

v0.2.3 update

remove duplicate labels
fix failing on missing sat api data

To update to latest version run the following:

docker pull  anclrii/storj-exporter:latest ; \
docker stop storj-exporter ; \
docker rm storj-exporter ; \
docker run -d --restart=unless-stopped --link=storagenode --name=storj-exporter -p 9651:9651 anclrii/storj-exporter:latest # or your custom options

Watchtower might also pickup the update if configured

Last time I forgot to tag new version to latest so the update might not have worked. Should work this time.

Grafana dashboard update

show score * 100 as percent for uptime and audits

kevink · October 13, 2019, 5:33pm

Thanks a lot for the update!
I won’t be able to confirm the fix since that satelite since has data for my node.