How to monitor all nodes in your LAN using prometheus + grafana [linux using docker]

kevink · January 3, 2021, 11:21am

1. Setting up prometheus

For storing all statistics of the nodes, we use prometheus. So choose one PC/Server where you want to run prometheus. It doesn’t need to be a powerful PC and it doesn’t need much space, you can decide how much space prometheus is allowed to use or how long you want to store your statistics. However, make sure you choose a PC (or a directory) that doesn’t write to an SD card as that would shorten its lifespan significantly.
On that chosen PC we first create a prometheus.yml file in a location you prefer:

# Sample config for Prometheus.

global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'example'

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['localhost:9093']

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s
    scrape_timeout: 5s

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090']

Then we create a prometheus container:

sudo docker run -d -p 9090:9090 --restart unless-stopped --user 1000:1000 --name prometheus \
	-v /sharedfolders/config/prometheus.yml:/etc/prometheus/prometheus.yml \
	-v /sharedfolders/prometheus:/prometheus \
	prom/prometheus --storage.tsdb.retention.time=360d --storage.tsdb.retention.size=30GB \
	--config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/prometheus

What you need to change in this command:

The paths “/sharedfolders/…” behind the “-v” arguments are my local paths. You have to change them according to where you just created the configuration file “prometheus.yml” and where you want the directory for the prometheus data. You could use a docker volume for storing that data but personally I prefer a directory. Make sure the directory exists and is owned by your first user.
The --storage.tsdb.retention.time is set to 360 days. Decide for yourself how long you want to store stats.
The --storage.tsdb.retention.size=30GB configures how big the prometheus database is allowed to get. If it gets bigger, it starts deleting old entries.

The argument --user 1000:1000 makes the container run as the first user of the PC (e.g. pi on an RPI) instead of root. You can remove it if you prefer to run your containers as root but I’d recommend running all containers as a user (unless they only work as root but all containers in this How-To including the storagenode can run as a user. Warning: Don’t change from root to a user! It will mess up your storagenode as the filesystem permissions will still be wrong.).

Don’t change the other arguments in the command.

The missing directories will be created on the first start of the container. If you however chose a location other than the os harddrive, you might want to use --mount for the prometheus storage directory like you do with the storagenodes, so that a HDD disconnect doesn’t result in the database being written to your OS drive (in which case those directories have to be present before you start the container).

Now you can check if prometheus is running by visiting: http://<“ip of your pc”>:9090

2. Setting up storagenode prometheus exporters

The prometheus exporter for storagenodes is made by @greener and the repository has all information needed: GitHub - anclrii/Storj-Exporter: Prometheus exporter for monitoring Storj storage nodes
There is also a thread on this forum about the exporter:

In this How-To we run the exporter on the device that runs the storagenode. It’s also possible to run it on the device running prometheus instead but I think this is more convenient. Also we’ll be using docker to run the exporter as it is easier.

So in short, do this on every PC that runs a node and run it for every node on that PC (with the required changes described below):

For x86/64 systems run this:

sudo docker run -d --restart=unless-stopped --link=storagenode --name=storj-exporter \ 
-p 9651:9651 -e STORJ_HOST_ADDRESS="storagenode" anclrii/storj-exporter:latest

For arm-devices you need to build the image yourself:

git clone https://github.com/anclrii/Storj-Exporter
cd Storj-Exporter
sudo docker build -t storj-exporter .
sudo docker run -d --restart=unless-stopped --link=storagenode --name=storj-exporter \ 
-p 9651:9651 -e STORJ_HOST_ADDRESS="storagenode" storj-exporter

What do you need to change in this command? (for all architectures)

–link=storagenode, “storagenode” needs to be the name of the container of the storagenode you want to link to
–name: choose an appropriate name for the container, doesn’t really matter what, it’s just for you (and needed in the next step)
-e STORJ_HOST_ADDRESS: “storagenode” needs to be the name of the container of the storagenode you want to link to
-p 9651:9651, if you run multiple nodes on one device, you need to change the port on the additional nodes, like -p 9652:9651

After starting the container you can check if it works by visiting: http://<“ip of node”>:9651 and it will print a lot of information.
If you run netdata on your PC, make sure to follow the netdata section in the storj-exporter repository to disable netdata’s polling of the exporter port as this will result in a heavy CPU load on your PC.

3. Configuring prometheus to scrape the storagenode information

Edit your prometheus.yml with your favourite editor and add the following job to the scrape_configs section:

  - job_name: storagenode1
    scrape_interval: 30s
    scrape_timeout: 20s
    metrics_path: /
    static_configs:
      - targets: ["storj-exporter1:9651"]
        labels:
          instance: "node1"

What do you need to change in this section?

job_name: whatever you like
scrape_interval: how often data is being pulled from your node. 30s is a good value to see bandwidth data.
targets: change this to match the name of your storj-exporter container from the last step and the port from the last step like [“container_name:port”] if you want to add a local storj-exporter. If you want to add a storj-exporter running on a different machine, use the ip-adress of that machine like [“ip-address:port”]
labels: instance: “node1”: choose any label you like, this will be the name shown in the prometheus data and on the grafana dashboard. It will be the name of your node everywhere. Make sure you can recognize the node from this name.

If you run multiple nodes on multiple devices, you have to add such a job section for each node on each device with the appropriate ip and port defined in the last step.

Now restart the prometheus container to use the changed configuration.
Then visit http://<“ip”>:9090/targets where you can see all your job configurations and if that exporter endpoint could be reached. Wait a while until all endpoints have been scraped to see if all your configurations work correctly.

4. Grafana

Install grafana on the PC your running Prometheus (or on any other but this would be advantageous):

sudo docker run -d \
  -p 3000:3000 \
  --name=grafana \
  --restart=unless-stopped \
  --user=1000 \
  -v /sharedfolders/grafana:/var/lib/grafana \
  -e "GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-simple-json-datasource,yesoreyeram-boomtable-panel 1.3.0" \
   -e GF_PLUGINS_ALLOW_LOADING_UNSIGNED_PLUGINS="yesoreyeram-boomtable-panel" \
  grafana/grafana

What do you need to change in this command?

name: can be whatever you like
user: this is the same as explained for prometheus
-v /sharedfolders/… : change this path to wherever you want grafana to store your settings. You could use a docker volume but I prefer a path.

After starting grafana, visit http://<“ip-adress”>:3000
You will then need to log in for the first time. The default user and password is both “admin”. You’ll have to change it afterwards.

Configure the datasource:
grafik

Should be automatically filled like this:
grafik

Replace “localhost” with the IP-address of your PC that is running prometheus.
Then Save&Test.
The overview of added datasource should then look like mine, where you can see the Ip-Adresse of my host “http://192.168.178.10:9090”.

Now you need a dashboard for your nodes:

Installation instructions:

Import Storj-Exporter-Boom-Table.json via your Grafana UI ("+" -> Import), select your prometheus datasorce at the top-left of the dashboard

grafik
Copy the boom-table.json from https://raw.githubusercontent.com/anclrii/Storj-Exporter-dashboard/master/Storj-Exporter-Boom-Table.json
and paste it into the json field of the import:

Then load and import it. No further configuration of the datasource should be needed and it will show all your nodes. It might take a while until there is enough data to show enough data in some graphs.

5. Enjoy

Hope this How-To helped you set up monitoring for all your nodes.
Let me know if a step was unclear or where you have problems and we’ll figure it out.

TheMightyGreek · January 3, 2021, 1:20pm

Very nice !
I set up the storj-exporter but it gives no information about the node, only overall informations on the system.

Here’s my node command:
docker run -d --restart unless-stopped --stop-timeout 300
-p 28969:28967
-p 127.0.0.1:14002:14002 \ (I tried both with and without 127.0.0.1)
-e WALLET="###"
-e EMAIL="###"
-e ADDRESS="###:28969"
-e STORAGE=100Gb
–mount type=bind,source="/mnt/StorjHDD/identities/Identity3/storagenode",destination=/app/identity
–mount type=bind,source="/mnt/StorjHDD/testNode",destination=/app/config
–name storagenodeTestTB storjlabs/storagenode:latest

Here’s my exporter command:
sudo docker run -d --restart=unless-stopped --link=storagenodeTestTB --name=storj-exporter -p 9651:9651 -e STORJ_HOST_ADDRESS=“storagenodeTestTB” storj-exporter

and it always gives me this result:

any idea of what I’m doing wrong ?

kevink · January 3, 2021, 1:47pm

hmm not sure, try running it without “” (even though both should work fine):

sudo docker run -d --restart=unless-stopped --link=storagenodeTestTB --name=storj-exporter -p 9651:9651 -e STORJ_HOST_ADDRESS=storagenodeTestTB storj-exporter

Another option would be to use a standard name “storagenode” just to be sure (I’m reaching now… I have 5 nodes running with 5 names and it works fine…). Maybe for some reason it doesn’t like your name?
I wish the exporter would give more information… Or do you have something in the docker logs for the exporter that could help?

kevink · January 3, 2021, 1:50pm

Oh… it’s possible docker -v creates only missing directories but not files. Updated the how-to. thanks!

TheMightyGreek · January 3, 2021, 1:58pm

I tried removing the “” but nothing changed. I tried to access the logs with $ docker logs storj-exporter but it didn’t return anything.
Also prometheus doesn’t want to start, maybe because it’s a x86 build ?
What should I change to get the armf version ?

Anyway thanks a lot for your help, I’ve wanted to set this up for a long time.

TheMightyGreek · January 3, 2021, 2:04pm

Alright I changed the container name to simply storagenode and now it works !
Still having problems with my prometheus container though, it just keeps rebooting…

kevink · January 3, 2021, 2:04pm

Weird… I have no pi node to check it though. I remember having had similar problems once but can’t remember how I actually solved it…
Weird that it works as “storagenode” which implies that it somehow ignores the name given in the docker run command… (unless there’s some spelling error somewhere?)

Prometheus has images for all architectures and should use the correct one automatically. If not, it should show something in the log. You could try running the prometheus command without “-d” so it’s not detached to show output to the console directly.

kevink · January 3, 2021, 2:18pm

So I found the problem. Prometheus doesn’t create a default configuration file… So you have to create it yourself first. I updated that part in the How-To. Sorry, I should have checked that when I wrote it. I just assumed every service would create its default configuration but maybe I missed something.

TheMightyGreek · January 3, 2021, 2:18pm

level=info ts=2021-01-03T14:09:00.584614977Z caller=main.go:302 msg=“Starting Prometheus” version="(version=2.7.1, branch=HEAD, revision=62e591f928ddf6b3468308b7ac1de1c63aa7fcf3)"
level=info ts=2021-01-03T14:09:00.584814685Z caller=main.go:303 build_context="(go=go1.11.5, user=root@f9f82868fc43, date=20190131-11:52:47)"
level=info ts=2021-01-03T14:09:00.584899435Z caller=main.go:304 host_details="(Linux 4.14.180-178 #1 SMP PREEMPT Wed Sep 2 12:39:45 -03 2020 armv7l 450de4e2ab38 (none))"
level=info ts=2021-01-03T14:09:00.585006477Z caller=main.go:305 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2021-01-03T14:09:00.585091102Z caller=main.go:306 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2021-01-03T14:09:00.591874114Z caller=main.go:620 msg=“Starting TSDB …”
level=info ts=2021-01-03T14:09:00.591895698Z caller=web.go:416 component=web msg=“Start listening for connections” address=0.0.0.0:9090
level=info ts=2021-01-03T14:09:00.620724497Z caller=main.go:489 msg=“Stopping scrape discovery manager…”
level=info ts=2021-01-03T14:09:00.620821331Z caller=main.go:503 msg=“Stopping notify discovery manager…”
level=info ts=2021-01-03T14:09:00.620883498Z caller=main.go:525 msg=“Stopping scrape manager…”
level=info ts=2021-01-03T14:09:00.620966581Z caller=main.go:499 msg=“Notify discovery manager stopped”
level=info ts=2021-01-03T14:09:00.621033998Z caller=main.go:485 msg=“Scrape discovery manager stopped”
level=info ts=2021-01-03T14:09:00.621148123Z caller=manager.go:736 component=“rule manager” msg=“Stopping rule manager…”
level=info ts=2021-01-03T14:09:00.621274373Z caller=manager.go:742 component=“rule manager” msg=“Rule manager stopped”
level=info ts=2021-01-03T14:09:00.621340873Z caller=notifier.go:521 component=notifier msg=“Stopping notification manager…”
level=info ts=2021-01-03T14:09:00.621537749Z caller=main.go:679 msg=“Notifier manager stopped”
level=info ts=2021-01-03T14:09:00.623626836Z caller=main.go:519 msg=“Scrape manager stopped”
level=error ts=2021-01-03T14:09:00.623701669Z caller=main.go:688 err=“opening storage failed: lock DB directory: open /prometheus/lock: permission denied”

that’s what I get now

kevink · January 3, 2021, 2:20pm

Did you create the prometheus storage location before starting the container? if not, it’ll be owned by root now and the container can’t write to it. Chown it with your main user (uid=1000)

maxsch · January 3, 2021, 2:40pm

Whats about STORJ_API_PORT?

kevink · January 3, 2021, 2:41pm

The API_PORT is always 14002 because we link into the storagenode container directly. It would only be different if you change the container internal dashboard port (but nobody does that).

TheMightyGreek · January 3, 2021, 2:46pm

I chowned the prometheus directory but it still gives the same error:
level=error ts=2021-01-03T14:45:00.623701669Z caller=main.go:688 err=“opening storage failed: lock DB directory: open /prometheus/lock: permission denied”

this it what ls -l should output right ? gabriele is the main user

root@odroid:/mnt/StorjHDD# ls -l prometheus/
total 4
-rw-rw-r-- 1 gabriele gabriele 1252 Jan 3 14:35 prometheus.yml

kevink · January 3, 2021, 2:49pm

So your prometheus config location is /mnt/STORjHDD/prometheus/prometheus.yml?

What is your prometheus storage location?

TheMightyGreek · January 3, 2021, 2:51pm

exactly:
/mnt/StorjHDD/prometheus/prometheus.yml

kevink · January 3, 2021, 2:53pm

please post your run command

TheMightyGreek · January 3, 2021, 2:54pm

sudo docker run -p 9090:9090 --restart unless-stopped --user 1000:1000 --name prometheus
-v /sharedfolders/config/prometheus.yml:/mnt/StorjHDD/prometheus/
-v /sharedfolders/prometheus:/prometheus
prom/prometheus --storage.tsdb.retention.time=360d --storage.tsdb.retention.size=30GB
–config.file=/mnt/StorjHDD/prometheus/prometheus.yml --storage.tsdb.path=/prometheus

TheMightyGreek · January 3, 2021, 3:42pm

I restarted from the beginning and here is my command:
root@odroid:/# sudo docker run -p 9090:9090 --restart unless-stopped --user 1000:1000 --name prometheus -v /sharedfolders/config/prometheus.yml:/mnt/StorjHDD/prometheus/prometheus.yml -v /sharedfolders/prometheus:/prometheus prom/prometheus --storage.tsdb.retention.time=360d --storage.tsdb.retention.size=30GB --config.file=/mnt/StorjHDD/prometheus/prometheus.yml --storage.tsdb.path=/prometheus

but it gives the following error:
level=error ts=2021-01-03T15:35:59.685Z caller=main.go:289 msg=“Error loading config (–config.file=/mnt/StorjHDD/prometheus/prometheus.yml)” err=“read /mnt/StorjHDD/prometheus/prometheus.yml: is a directory”

TheMightyGreek · January 3, 2021, 3:42pm

I recreated the prometheus.yml file from scratch as per the instructions.

kevink · January 3, 2021, 3:45pm

you changed the wrong side of the command… right side is inside the container, don’t change that.

don’t change that. it’s inside the container too.