So with the new update I noticed that I can query new metrics in prometheus, that aren’t from the exporter directly, but processed through netdata:
e.g. netdata_prometheus_storj_payout_currentMonth_currentMonth_average
How can we prevent this? Firstly it results in storing the metrics twice in different namespaces and secondly, netdata is polling the endpoint every 5 seconds, resulting in high cpu usage spikes on the exporter and the storagenode.
This probably has to do with netdata, there’s no integration with netdata in storj-exporter and I’m not using it so can’t confirm what those netdata_* metrics are.
As to the endpoint polling I recall we discussed this on github issue and happens if you point netdata to health-check storj-exporter port. Currently exporter returns metrics on any url and I will limit it to /metrics to prevent cpu drain on port health-checks etc. That’s up next.
Yep, doing this slowly. So far I have added a boom-table for individual nodes with breakdown of traffic/audit/uptime etc per satellite. Also going to replace payout formulas with the new payout metrics which will be more precise and also add any new metrics if I find them useful.
oh you are not using netdata…
Yes ist has nothing to do with the exporter directly, it is something netdata does by default. I don’t have anything configured about that.
I just figured that most people are using netdata so they will have a similar experience to mine.
I’m not sure it has something to do with health-checks since netdata is actually pulling the exporter’s data.
I have a pretty new node running and some information seems to be missing/not showing up… is that because of the node’s age and missing information or did I configure something wrong?
Also, which things exactly fire off every so often? I am running the python file directly for the exporter (how often does that scrape my logs)? I understand the prometheus server forwards this information (how often does that fire) and lastly grafana reads prometheus server (how often does that happen?). And where can I tweak the timings of each of those?
I got rid of netdata polling the exporter endpoint by using docker-compose to connect the exporter to prometheus without exposing any port to the host. This significantly reduced the CPU spikes.
Key differences to the “old” dashboard: replaced vetting stats per node with online Score, fixed uptime in node overview, separated net out into egress and repair egress, fixed calculation of montly payout (but using old data, not the new metrics yet so this dashboard has held amount included)
Thanks for this. I’m just getting started using Storj-Exporter (Unfortunately running a docker for each of my 3 nodes - If there’s a better way, please let me know). One comment, and this could just be me, but for “Storage Used” & Ingress/Egress graphs…Is the Storage Sum and I/O labels backwards? Shouldn’t they be on the other sides of the charts?