Loki Storj-Exporter-Logs and Storj-Logs dashboard (Update 2021-05-06)

fmoledina · February 8, 2021, 7:40pm

Hello all,

I’ve developed a dashboard that builds on the popular Storj-Exporter and Storj-Log-Exporter dashboards and uses Grafana Loki and Promtail as an alternative for scraping log files generating metrics. The primary incremental benefit this has over the existing toolchain is that log entries can be populated into Grafana dashboards, allowing to view logs for all nodes in one location.

Storj-Exporter-Logs dashboard:

Storj-Logs dashboard:

Motivation

I’ve been interested in exploring Grafana Loki with Promtail for log ingestion and metrics for a number of different services on my home server. Testing it out for Storj nodes seemed like a great way to get an understanding of how it works.

For Storj nodes, the main benefit is that a single Promtail listener can injest logs from multiple nodes and produce metrics that Prometheus can then scrape. Individual storj-log-exporter instances are not required.

Furthermore, once the logs are ultimately shipped to Loki, one can do LogQL queries against them in Grafana or using LogCLI. For example, to search for all ERROR log level entries:

Installation

Edit Storj Node Logging Parameters

Note: This dashboard requires the json log output from the storagenode service rather than the familiar console output. The log output looks different than the typical console output. However, common tools such as successrate.sh appear to be able parse JSON log files without issue.

An example log line item in JSON format is as follows:

{"L":"INFO","T":"2021-02-04T15:47:50.240Z","N":"piecestore","M":"download started","Piece ID":"H5MHLOAWQBOZKVABCF5SXROFFON2HSKZNSMFJSICIRDEFNRKZVBA","Satellite ID":"1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE","Action":"GET"}

Also, the Promtail configuration in this repo requires the Storj logs to be exported to a log file. An alternative Promtail configuration that uses the Loki Docker driver to ship logs directly is included in my full Storj docker-compose stack repo at fmoledina/docker-storj-config.

To set the log output to JSON encoding and to specify a log file, proceed with the following steps:

Stop the storagenode.
Edit the node config.yaml and set the following log.encoding and log.output parameters:

# configures log encoding. can either be 'console', 'json', or 'pretty'.
log.encoding: "json"

# can be stdout, stderr, or a filename
log.output: /mnt/storj/logs/storj01.log  # modify as required

Configure Promtail

Edit ./appconfig/promtail/config.yml for nodename and __path__ values, where __path__ is the log file configured in the Storj node config.yaml.

#### Edit the node labels below to suit your configuration #####

## Storj node logging to Loki + Prometheus metrics generation
## Requires log.encoding="json"
## Ensure nodename is the same as in Prometheus config for Storj-Exporter
- job_name: storj
  static_configs:
  - targets:
      - localhost
    labels:
      job: storj
      nodename: storj01   # Same as prometheus config
      __path__: /mnt/storj/logs/storj01.log
  # - targets:
  #     - localhost
  #   labels:
  #     job: storj
  #     nodename: storj02
  #     __path__: /mnt/storj/log/storj02.log

Configure Prometheus

Edit your Prometheus config to adjust Storj-Exporter and add Promtail scrape configs. Note that the label used is nodename rather than instance as in Storj-Exporter and storj-log-exporter instructions. This is to align with the corresponding nodename variable provided by Promtail for Storj node log files.

Add nodename to your existing job for each node:

  - job_name: 'storj-exporter'
    static_configs:
      ...
      - targets: ["storj01-exporter:9651"]  # adjust for your Storj-Exporter installation
        labels:
          nodename: "storj01"   # Same as promtail config
          instance: "storj01"   # Allows compatibility with existing Storj-Exporter-Dashboard
      # - targets: ["storj02-exporter:9651"]
      #   labels:
      #     nodename: "storj02"
      #     instance: "storj02"
      ...

Also add the Promtail metrics endpoint, where all the log metrics will be scraped from:

  - job_name: 'promtail'
    static_configs:
      - targets: ['promtail:9080']  # adjust for your docker configuration (i.e. localhost:9080 if forwarding ports)

Start Loki and Promtail

Start the service with the following docker run commands:

docker run -d --name promtail -p 9080:9080 -v /mnt/storj/logs:/mnt/storj/logs -v ./appconfig/promtail:/config -v /path/to/appdata/promtail:/data grafana/promtail:2.1.0 -config.file=/config/config.yml
docker run -d --name loki -p 3100:3100 -v ./appconfig/loki:/config -v /path/to/appdata/loki:/data grafana/loki:2.1.0 -config.file=/config/local-config.yaml

Alternatively, use the quick-start guide available at the Storj-Exporter-dashboard repo and add the services from the docker-compose.yml in this repo.

Start all the services in the combined docker-compose.yml:

docker-compose up -d

Add dashboard to Grafana

Add the dashboard from the files dashboard-exporter-logs.json and dashboard-logs.json in the same way as described in KevinK’s How-To monitor all nodes in your lan:

You’ll need to create the connection to the Loki datasource in Grafana and select that datasource along with Prometheus when loading these dashboards.

Notes

The Notes section from Storj-Log-Exporter are valid for the Promtail log metrics methodology presented in this repo.

Logging Limits

Loki has been configured with a 30 day log retention time. Adjust retention_period under table_manager in ./appconfig/loki/local-config.yaml if a different retention time is desired.

fmoledina · March 12, 2021, 5:21am

Updated the Promtail config and dashboard with a minor release.

Changelog

v0.0.2

Add delete_expired metric
Update dashboards

fmoledina · May 6, 2021, 4:21pm

Updated the Promtail config with a minor release.

Changelog

v0.0.3

Update satellite names
Update delete_successful metric based on new message delete piece sent to trash in logs

kevink · May 6, 2021, 6:14pm

Is the message “deleted” permanently renamed to “delete piece sent to trash”? afaik this was only a temporary security measure until the bug has been resolved?

fmoledina · May 6, 2021, 6:36pm

No clue. I’m coming back to this after a bit of a hiatus so I’m not fully up to speed. Just noticed on my dashboard that deletes weren’t showing up.

What’s the link to the bug description?

kevink · May 6, 2021, 6:56pm

fmoledina · May 6, 2021, 10:03pm

Thanks for the background. I now also have context for what the error message would look like. I’ll keep the change until they revert it back to deleted. That way my dashboard populates the deletions correctly

Another change I’ve made that I haven’t published yet is having the Delete successful tile include the sum of delete_successful and delete_expired in the numerator, considering them both to be successful events. It’s resulting in 100% score for that calc. So my formula is:

delete_successful_perc = (successful + expired) / (successful + expired + failed)

I’ll keep an eye on it for a bit before making the commit.

kevink · May 7, 2021, 4:56am

Sounds goot to me. The delte metric doesn’t seem very helpful anyway because it seems to be 100% all the time Unless the piece trying to be deleted doesn’t exist anymore.