Prometheus Storj-Exporter

kevink · October 30, 2020, 6:03am

Awesome update! Thanks so much!

So with the new update I noticed that I can query new metrics in prometheus, that aren’t from the exporter directly, but processed through netdata:
e.g. netdata_prometheus_storj_payout_currentMonth_currentMonth_average

How can we prevent this? Firstly it results in storing the metrics twice in different namespaces and secondly, netdata is polling the endpoint every 5 seconds, resulting in high cpu usage spikes on the exporter and the storagenode.

Arkina · October 30, 2020, 9:46am

Nice Work, are you also going to Update the Dashboard template?

greener · October 30, 2020, 11:10am

This probably has to do with netdata, there’s no integration with netdata in storj-exporter and I’m not using it so can’t confirm what those netdata_* metrics are.

As to the endpoint polling I recall we discussed this on github issue and happens if you point netdata to health-check storj-exporter port. Currently exporter returns metrics on any url and I will limit it to /metrics to prevent cpu drain on port health-checks etc. That’s up next.

greener · October 30, 2020, 11:16am

Yep, doing this slowly. So far I have added a boom-table for individual nodes with breakdown of traffic/audit/uptime etc per satellite. Also going to replace payout formulas with the new payout metrics which will be more precise and also add any new metrics if I find them useful.

kevink · October 30, 2020, 11:41am

oh you are not using netdata…
Yes ist has nothing to do with the exporter directly, it is something netdata does by default. I don’t have anything configured about that.
I just figured that most people are using netdata so they will have a similar experience to mine.
I’m not sure it has something to do with health-checks since netdata is actually pulling the exporter’s data.

fmoledina · October 30, 2020, 2:36pm

This is an awesome release! Thank you!

thelastspark · October 30, 2020, 6:56pm

I have a pretty new node running and some information seems to be missing/not showing up… is that because of the node’s age and missing information or did I configure something wrong?

Also, which things exactly fire off every so often? I am running the python file directly for the exporter (how often does that scrape my logs)? I understand the prometheus server forwards this information (how often does that fire) and lastly grafana reads prometheus server (how often does that happen?). And where can I tweak the timings of each of those?

Thanks in advance!

kevink · October 31, 2020, 11:42am

I got rid of netdata polling the exporter endpoint by using docker-compose to connect the exporter to prometheus without exposing any port to the host. This significantly reduced the CPU spikes.

version: '3.7'

services:
  storagenode:
    image: storjlabs/storagenode:latest
    container_name: storagenode1
    user: "1000:1000"
    restart: unless-stopped
    ports:
      - 7777:7770
      - 14002:14002
      - 28967:28967
    environment:
      - WALLET=
      - EMAIL=
      - ADDRESS=
      - STORAGE=9TB
    volumes:
      - type: bind
        source: /media/STORJ/STORJ
        target: /app/config
      - type: bind
        source: /media/STORJ/identity
        target: /app/identity
    networks:
      - default
    stop_grace_period: 300s
    deploy:
      resources:
        limits:
          memory: 4096M

  storj-exporter:
    image: anclrii/storj-exporter:latest
    container_name: storj-exporter1
    user: "1000:1000"
    restart: unless-stopped
    environment:
      - STORJ_HOST_ADDRESS=storagenode1
      - STORJ_API_PORT=14002
      - STORJ_EXPORTER_PORT=9651
    networks:
      - default

  prometheus:
    image: prom/prometheus
    container_name: prometheus
    user: "1000:1000"
    ports:
      - 9090:9090
    volumes:
      - /sharedfolders/config/prometheus.yml:/etc/prometheus/prometheus.yml 
      - type: bind
        source: /sharedfolders/prometheus
        target: /prometheus
    restart: unless-stopped
    command: --storage.tsdb.retention.time=720d --storage.tsdb.retention.size=30GB --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/prometheus
    networks:
      - default

networks:
  default:

and in promtheus.yml:

  - job_name: storagenode
    scrape_interval: 30s
    scrape_timeout: 20s
    metrics_path: /
    static_configs:
      - targets: ["storj-exporter1:9651"]
        labels:
          instance: "node1"

kevink · November 8, 2020, 7:16pm

For everyone waiting for a dashboard with uptime score, this is mine (hope it works for everyone this way, might need to change the datasource):

gist.github.com

https://gist.github.com/kevinkk525/7dea9f744ad6fff009b0195bae9a3125

storj_exporter_dashboard.json

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",

This file has been truncated. show original

Screenshots:

Key differences to the “old” dashboard: replaced vetting stats per node with online Score, fixed uptime in node overview, separated net out into egress and repair egress, fixed calculation of montly payout (but using old data, not the new metrics yet so this dashboard has held amount included)

Chris21788 · November 9, 2020, 7:47pm

Thanks for this. I’m just getting started using Storj-Exporter (Unfortunately running a docker for each of my 3 nodes - If there’s a better way, please let me know). One comment, and this could just be me, but for “Storage Used” & Ingress/Egress graphs…Is the Storage Sum and I/O labels backwards? Shouldn’t they be on the other sides of the charts?

kevink · November 9, 2020, 7:57pm

One exporter container per node is the correct way.

oh lol, you’re right. I never realized that.

fmoledina · November 9, 2020, 8:24pm

That could be carryover from the original dashboard by @greener. Some of those graphs have the y-axes titles backwards.

web4yougmbhch · November 11, 2020, 3:35pm

What or where do i have to check this “undefined” Uptime? tnx

kevink · November 11, 2020, 3:58pm

Not sure why it’s undefined. Metrics for onlineScore:
avg(storj_sat_summary{type=“onlineScore”}) by(instance) * 100

fmoledina · November 11, 2020, 4:07pm

Have you updated Storj-Exporter for all of your node instances?

web4yougmbhch · November 11, 2020, 4:21pm

tnx and here some other words

Cross91 · November 14, 2020, 8:54pm

Hey together,
does anybody know, how to give Alias-Names to the Satellites?
Now it looks like this:

I’d like to give them the “front” names:

How is this possible?

Thanks.

kevink · November 14, 2020, 9:19pm

Yeah that would be nice but personally I have no idea how to do that.

maxsch · November 14, 2020, 9:26pm

It´s easy, you can see the names in web-dashboard:

Then edit your fav. grafana panel and add a override with field name:

mus6677 · November 14, 2020, 9:41pm

Managed to change the original JSON file to display URL instead of Sat ID:

I just replace “legendFormat”: “{{satellite}}” with “legendFormat”: “{{url}}”