Howto: storage node health check > discord + email alerting script

About speed: My system is not overloaded with 20% hdd busy and less than 10% cpu.
I tried to comment renice but I have the same speed (about 25min per node).
Seems stuck 15min from “storj versions: current larger” to “docker log 720m selected : #235101

This is the result of debug mode:
./storj-system-health.sh -vq

*** timestamp [09.05.2023 21:30]
*** config file loaded: ./storj-system-health.credo
*** settings file path: .storj-system-health

running the script for node “storagenode201” (/node201) …
*** node is running : 1
*** disk usage : 52.80% (incl. trash: 53.48%)
*** satellite scores url : localhost:14001/api/sno/satellites (OK)
… satellite scores:
… satping difference: 39300 (1683667812 - 1683628512) / freq: 3600
*** settings: satellite pings will be sent: false
*** storj node api url : localhost:14001/api/sno (OK)
*** storj version current : installed 1.78.2
*** storj version latest : github 1.76.2 [2023-04-03]
… storj versions unequal
… storj versions: current larger
*** docker log 720m selected : #235101
*** docker log 60m selected : #18322
*** info count : #18313
*** audit error count : #0
*** repair failures count : #0
*** fatal error count : #0
*** severe count : #0
*** other error count : #0
*** i/o timouts count : #0
*** audits : w: 0.00%, c: 0.00%, s: 100%
*** downloads : c: 2.16%, f: 0.02%, s: 98%
*** uploads : c: 0.04%, f: 0.27%, s: 100%
*** repair downloads : c: 0.00%, f: 0.00%, s: 100%
*** repair uploads : c: 0.00%, f: 0.08%, s: 100%
*** 60 m activity : up: 9772 / down: 5495 > OK
*** i/o timouts ignored : false
… audit time lags selection:
… settings read: (declare -A settings=([storagenode218_payTimestamp]=“1683586027” [storagenode219_payTimestamp]=“1683586938” [storagenode218_payValue]=“0” [storagenode201_payValue]=“0” [satping]=“1683628512” [storagenode219_payValue]=“0” [storagenode201_payTimestamp]=“1683629746” )).
… settings : tmp_todayDay=9
… settings : tmp_todayHour=21
… settings : tmp_todayMinutes=45
… settings : storagenode201_payTimestamp found.
… settings : storagenode201_payValue found.
… settings : storagenode201_payTimestamp=1683629746
… settings : storagenode201_payValue=0
… settings : tmp_payTimestamp=1683629746
… settings : tmp_payDateDay=9
… settings : tmp_payDateHour=10
… settings : tmp_payDateMinutes=55
… settings : tmp_egressBandwidthPayout=121.62
… settings : tmp_egressRepairAuditPayout=21.29
… settings : tmp_diskSpacePayout=207.15
… settings : tmp_currentMonthExpectations=910
… settings : tmp_estimatedPayoutTotal calculated: 350.06
… settings : tmp_payDiff=350.06
… push message sending: sendpush: false, discordon: true, hour: 21, minutes: 45, details: false
*** no discord success rates to be sent.

Exactly on the point where the logs are read.

I have 70k entries within 360 mins. And it takes 1 min for 2 nodes to run.

So somehow, your logger selection looks really slow. Hmm

v1.10.3-5 improvement & bugfix releases:

  • fixed “command not found” issue, when there are pending audits
  • minor tweaks in the README file, linked to the crontab examples mentioned there
  • minor tweaks to onlineScore and download/upload warnings

That script is impressive. I hope it’s not too slow running or load intensive but am keen to get into it. Looks like I could use it on my Debian servers. Cheers.

1 Like

Thank you! It helped me a lot during the last months and quickly alerts in case of issues. That allows to react very fast.

Some features can be skipped in order to speed it up. Just check -h and the readme on GitHub.

If you have any question or issue, please simply pm me.

1 Like