Logs: how to send relevant log messages to a Discord web hook

Hi there,

I was wondering, if I could be aware of any error message in the logs, when they occur and not when I realise my node is offline or similar.

I’ve found out, that pushing warnings to a Discord webhook would fit my needs best. That’s why I have found the following howto and script (both at the end of the post).

My questions to you experts:

  1. Which log messages should I care of?
  2. How can I [pipe or grep or summarize] the log analysis results into the given .sh script, mentioned above?
  3. Ideally I run a .sh script with crontab regularly. What are your recommendations here?

Thank you very much in advance!

#!/bin/bash
message="$@"

## format to parse to curl
msg_content=\"$message\"

## discord webhook 
url='https://my-discord-webhook-url'

## sending the message to discord
curl -H "Content-Type: application/json" -X POST -d "{\"content\": $msg_content}" $url

https://jasonloong.com/2018/sending-linux-variables-to-discord-webhook-from-linux-bash-shell-script/

Hi @Bivvo , I’ve forwarded your questions to the team for some answers.

1 Like

Hi @Bivvo,

If you were to do this then you should be looking for ‘ERROR’ and ‘FATAL’ log entries. These cause the most problems for storagenodes.

Linux grep is outside my knowledge. Searching the forum should show something helpful, e.g. This recent forum post where SGC is grep’ing his log for specific entries - GET_REPAIR failed because of invalid signature - #34 by SGC

oh this looks cool, i’ve been looking for something like this…

more or less has the other end of whats required to make this work… just need to cross some t’s and dot some i’s

If it’s easy, I would be very happy to have some coding help for the bash script(s).

i’ll post my first draft of it here, hopefully sooner than later, have had something like this in mind for a long time but never really wasn’t sure of where or how to pipe the data.

i plan to make it very straight forward to setup, atleast the parts i’ve been working on, and i already performance tested and optimized the solution, even if the last couple of bits doesn’t work exactly yet.
and thus far it’s pretty sleek, but don’t want to to be to advanced either because it will be running all the time.

i’ll throw myself at it later today.

Very cool - I am not that much an expert of bash coding, but will support in fine tuning and handling wherever I can. Thx in advance.

oh i’m not great at it either lol, just sort of stumbling ahead one byte at a time lol…
and maybe i’m a bit picky about how to make it, because i want to consider the performance impact.

my script seems to be failing again and not for the first time… not sure my solution is even viable, since logs aren’t very useful if they don’t work reliably…
i will most likely be switching from my custom logging script to this.

1 Like

I have taken this bash script:

… and reused it in the success rates script from this post:

E.g. this looks like this:

..
echo_downloads=$(printf '%.0f\n' $(echo -e "$dl_success $dl_failed $dl_canceled" | awk '{print ( $1 / ( $1 + $2 + $3 )) * 100 }'))
if [[ $echo_downloads -lt 98 ]] || [[ $DEB -eq 1 ]]
then 
	./discord.sh --webhook-url=$url --text "$([[ $DEB -eq 1 ]] && echo 'INFO' || echo 'WARNING') downloads: $dl_ratio"
fi
..

This will ping me via discord, in case the download rate drops below 98% and periodically in debug mode just for my (daily) information. Having the success rate script prepared, it is called via crontab with and without parameters.

I’ll extend that with other scripts like “scanning” the logs for FATAL or ERROR messages and/or SMART tests of the HDD(s) and push them on a regular base as well.

i was told one shouldn’t create ones own logging scripts… i guess its a bit like the issue with creating ones own mail server, its just a bad idea because even tho it’s supposedly a fairly simple task, it does end up becoming much more advanced and labor intensive as it progresses.

my failure i think was in me trying to process the logs while they was being exported, to avoid them being read multiple times, want to cut down on iops where i can.
it seemed to work at first, but for some reason the script that would work fine for one container might not want to execute correctly for others, maybe due to some inherent limitations of my server, or other software issues.

i hope the grafana / prometheus / loki solution can process the logs live with out issue, but i’m sure they can do that just fine.

my attempts at processing my logs with custom scripts sure wasn’t a good idea and now about 1 year in i can just throw it all out lol

I’ve understood it is not ready for everyday-usage - at least I’ve not understood how to implement it. So far, I can live with my custom pings. :wink:

1 Like

Is there anything I should take care of additionally?

Currently I have a regular “view” on the following log counts:

tmp_disk_usage=$(df -H | grep storage) # for disk usage
tmp_fatal_errors=$(docker logs storagenode 2>&1 | grep FATAL -c)
tmp_rest_of_errors=$(docker logs storagenode 2>&1 | grep ERROR | grep -v -e "collector" -e "piecestore" -c)

And just to let you know how it looks like (almost satisfied):


one of the issues with
docker logs storagenode 2>&1
is that it will read the entire storagenode log file since last update, so this is an ever increasing resource demand, ofc it doesn’t matter to much if it doesn’t do it to often, but then that defeats much of the point of live tracking it for alerts.

one could set a max size for the storagenode log file in docker or use a daily log file and run the script on that…

this parameter will set the max megabyte size for docker logfiles based on megabytes, usually on older nodes its like 20-45 MB a day i think…

so each time your script runs now at the day 18 since the last docker storagenode release push your docker log file will be something along the size of 400MB to 2GB which you seem to be processing twice… and then however many times you decide to do that per day.

#docker log size max parameter set to 1MB
--log-opt max-size=1m \

You can use a --since option

docker logs --help

Usage:  docker logs [OPTIONS] CONTAINER

Fetch the logs of a container

Options:
      --details        Show extra details provided to logs
  -f, --follow         Follow log output
      --since string   Show logs since timestamp (e.g.
                       2013-01-02T13:23:37Z) or relative (e.g. 42m for 42
                       minutes)
  -n, --tail string    Number of lines to show from the end of the logs
                       (default "all")
  -t, --timestamps     Show timestamps
      --until string   Show logs before a timestamp (e.g.
                       2013-01-02T13:23:37Z) or relative (e.g. 42m for 42
                       minutes)
1 Like

I’ve done both already: limited the docker log selection with “since”:

LOG="docker logs --since "$(date -d "$date -1 day" +"%Y-%m-%dT%H:%M")"  $DOCKER_NODE_NAME"

… and limited the logs as well:

docker run -d --restart unless-stopped --stop-timeout 300 \
..
    --log-opt max-size=100m \
    --log-opt max-file=5 \
...
1 Like

i didn’t have much luck with the --log-opt max-file= multiple files thing, but didn’t really give it much of a chance as i didn’t really need it for anything… maybe i just did something wrong.

seems to work for me, I guess:

-rw-r----- 1 root root  53011692 Nov 27 22:47 abc-json.log
-rw-r----- 1 root root 100000841 Nov 26 23:13 abc-json.log.1
-rw-r----- 1 root root 100000057 Nov 24 20:07 abc-json.log.2
...

I intend to optimise logging either by logging to the RAM or logging to another, external HDD - in order to reduce write operations on the RPi’s SD. Will keep you posted.

weird how your logs seems larger than mine… i don’t think my zfs compression is counted when just using ls -l
because only zfs can see that its compressed…
1 is 16tb and 2 is like 4tb so the size seems almost irrelevant to the log sizes…
and 3 is even less…
these are all 24 hour logs
i guess these scripts might also be failing and your logs are the correct size and mine are lacking data, haven’t gotten around to replacing my custom scripts

-rw-r--r-- 1 100000 100000  27269607 Nov 27 22:54 2021-11-27-sn0001.log
-rw-r--r-- 1 100000 100000  30906674 Nov 27 22:54 2021-11-27-sn0002.log
-rw-r--r-- 1 100000 100000  24092844 Nov 27 22:54 2021-11-27-sn0003.log

30906674 eg is around 30 mb, right?