Windows/Powershell version of the script:
Linux/Bash version of the script:
Grafana dashboard monitor of the script:
[TIG stack]
Daily Email Version:
Historic Post:
I found the current monitoring scripts very difficult to determine trends. And I was feeling like learning some better .bash. I created this script to better measure the ratios of successful transfers over time. I plan to have this run periodically, and send the results to a influxDB for Grafana monitoring.
Please share your results and hardware!
It currently runs against the docker logs since last RUN.
Of course, this presents new challenges when measuring historically (all-time, since last storagenode run, or last 24hrs). I still have to think how that might work, but at least for now, this could be used to see if itās trending up or down within short periods of time.
Example;
- Save to AuditRates.sh
- sudo chmod +x AuditRates.sh
- sudo ./AuditRates.sh
9 Likes
How install bc on Synology ?
I donāt have access to Synology, but BC could be substituted in the script. Itās serving a math function to float decimal points. This is because bash only supports integer rounding (no decimal places) If another package you have can do that i can modify the script. Iām usually in powershell, so Iām probably not using best practice.
We could also look into going cross platform through a docker container if demand is high enough.
Could you add āupload rejectedā? That was added with the last storage node release.
2 Likes
Made some tweaks.
- Showing more output of in between numbers.
- Moved the log command to a variable so it can be edited by SNOās who have log being output to a file or use a different container name.
- Split success rate for audit to a min and max based on recoverable vs unrecoverable errors. The max really should always be 100%.
- Replaced bc with awk so it will work on more systems. (Including Synology)
- Added accepted rate for uploads based on rejected log lines.
Output looks like (Edit: made it more colorful):
4 Likes
Rejected: 0
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted
The first version still works fine. I might be missing some necessary little program, had to install bc to get the first version to work.
Youāre not missing anything. Itās somehow trying to divide by 0. Can you show a bit more of your output so I can see the numbers itās trying to work with?
Especially if one of the Successful lines is 0.
========== AUDIT =============
"docker logs" requires exactly 1 argument.
See 'docker logs --help'.
Usage: docker logs [OPTIONS] CONTAINER
Fetch the logs of a container
Successful: 0
"docker logs" requires exactly 1 argument.
See 'docker logs --help'.
Usage: docker logs [OPTIONS] CONTAINER
Fetch the logs of a container
Recoverable failed: 0
"docker logs" requires exactly 1 argument.
See 'docker logs --help'.
Usage: docker logs [OPTIONS] CONTAINER
Fetch the logs of a container
Unrecoverable failed: 0
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted
Success Rate Min: 0.000%
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted
Success Rate Max: 0.000%
========== DOWNLOAD ==========
"docker logs" requires exactly 1 argument.
See 'docker logs --help'.
Usage: docker logs [OPTIONS] CONTAINER
Fetch the logs of a container
Successful: 0
"docker logs" requires exactly 1 argument.
See 'docker logs --help'.
Usage: docker logs [OPTIONS] CONTAINER
Fetch the logs of a container
Failed: 0
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted
Success Rate: 0.000%
1 Like
That it didnāt work makes no sense, looking at the script, everything looks ok.
I tried adding an echo of the line about to be executed on the first command.
docker logs storagenode 2>&1 | grep GET_AUDIT | grep downloaded -c
"docker logs" requires exactly 1 argument.
See 'docker logs --help'.
Well, at least I know what to look for. I was testing with cat since I have my log output to a file. Give me a bit. Btw, Iām also adding if statements to catch division by 0 and adding a splash of color.
1 Like
Your vales are empty (default to 0), and brights script does not include sudo for the docker command. Did you sudo in the .sh? If so, you need to add your account to the docker group, or sudo to brights docker commands.
Iāve updated the script. Apparently bash doesnāt evaluate things like multiple commands or the > redirection within a variable. So I pulled it out of the LOG variable. While it is not necessary when using cat, it also doesnāt hurt. New version is up at the same link and should work now.
@subwolf My docker logs statement gives an empty result since I redirected the logs, so please test and report back.
Use github so I can pull direct.
That works great. Would you be up for pulling all data into a log file and using that for comparison? Say go back a few days to make it more accurate.
========== AUDIT =============
Successful: 528
Recoverable failed: 4
Unrecoverable failed: 0
Success Rate Min: 99.248%
Success Rate Max: 100.000%
========== DOWNLOAD ==========
Successful: 14636
Failed: 77
Success Rate: 99.477%
========== UPLOAD ============
Successful: 37163
Rejected: 2207
Failed: 465
Acceptance Rate: 94.394%
Success Rate: 98.764%
========== REPAIR DOWNLOAD ===
Successful: 312
Failed: 343
Success Rate: 47.634%
========== REPAIR UPLOAD =====
Successful: 768
Failed: 490
Success Rate: 61.049%
Dumb question but what does docker pull by logs by default? Is there a rotation?
Docker retains logs for the current container without limits or rotation by default. So when you rm the container (which happens during update as well) you also remove the logs and start over.
This is why I output logs to a file. You can do that by adding the following lines to your config.yaml file and then restarting your node.
# output location for log
log.output: "/app/config/node.log"
What is the amount saved by default? Would prefer some kind of hours figure.
Github version here: https://github.com/ReneSmeekes/storj_success_rate
I also replaced the link in my previous post to this one. Iāll update here if I make changes from now on.
Bear in mind the instance had only been up 10 minutes after a reboot:
[sudo] password for subwolf:
========== AUDIT =============
Successful: 52
Recoverable failed: 0
Unrecoverable failed: 0
Success Rate Min: 100.000%
Success Rate Max: 100.000%
========== DOWNLOAD ==========
Successful: 3938
Failed: 12
Success Rate: 99.696%
========== UPLOAD ============
Successful: 6761
Rejected: 1553
Failed: 126
Acceptance Rate: 81.321%
Success Rate: 98.170%
========== REPAIR DOWNLOAD ===
Successful: 33
Failed: 35
Success Rate: 48.529%
========== REPAIR UPLOAD =====
Successful: 85
Failed: 40
Success Rate: 68.000%