I did a bash healthcheck script that restart my docker and report my node status to my whatsapp (Details here)

radicalrj · January 24, 2025, 12:10am

It is 01am and I did this script together with chatgpt because i am lazy

Initially was suppose just to restart my docker in case the node is down.
Then i decide to integrate with callmebot api, since it has a free whatsapp integration.
Then I decide to add a node report every 6h, skipping sleep time.
And my small chatgpt script started to became a MONSTER!
It is now so big that chatgpt can no longer help me!

I plan to refactor it later, i can think in many ways to make it simpler. (I want at least to separate the docker healthcheck from the reporting)

But for now I am sharing the solution here, is fully functional with all above features.
I did not test the docker container crashing yet, but the script looks correct to me, it will trigger notification and try to restart it.

gist.github.com

https://gist.github.com/rafaelbiriba/26ce578b1c5ea2531567b846d207f123

storj_healthcheck.sh

#!/bin/bash

# Configuration
CALLMEBOT_API="https://api.callmebot.com/whatsapp.php"
PHONE_NUMBER="4917632323232"
API_KEY="12345678"
DOCKER_CONTAINERS=("storagenode1" "storagenode2")
STORAGE_PATHS=("/mnt/user/Storj/" "/mnt/user/Storj2/")
IDENTITY_PATHS=("/mnt/user/appdata/storj/Identity/storagenode/" "/mnt/user/appdata/storj/Identity/storagenode2/")
NODE_API_URLS=("http://192.168.0.2:14002/api/sno/" "http://192.168.0.2:14003/api/sno/")

This file has been truncated. show original

snorkel · January 24, 2025, 8:47am

What is the trigger? Just ping it and no response? Like Uptime Robot? Or something else?
Could it falsely report “node is down” and restart it in case of an update?
You should put a delay… (2 minutes maybe?), to let the updates install and restart the node.
Also, it should ping something else on that machine, just to make sure it’s just the node offline, node the network/internet/router.
Also, you could report the SMART lest test status. I’m struggeling with email reports of it, on Ubuntu Server, because I don’t want to save my email password on the server, or baypass the 2-factor auth. So Whatsapp would be a good option.
And change the saltlake sat in SLT to center the dislayed stats with other sats.

MarviBiene · January 24, 2025, 11:22am

I use uptime robot http check. It works like charm. I get almost instant notification if the Node gets unreachable. And if the healthy status changes to false it triggers uptime robot too.

Alexey · January 25, 2025, 7:37am

I also do not understand, why do not use the Open source Uptime Kuma, if UptimeRobot is not acceptable for some reason.

But more options should be better.

Alexey · January 25, 2025, 7:38am

false usually means that the reputation is got affected, so - a smart move!

MarviBiene · January 25, 2025, 9:52am

Yes I got it when my online score got too affected.
I love that you guys implemented the “All healthy=false” status as a http 500 error♥️ So it should trigger any Monitoring software automatically.