Can someone help me create a CURL command to check if nodes are online?

joesmoe · October 22, 2020, 9:11am

For each line in docker ps that beings with storj*, i want to curl the RPC port and if it is reporting OFFLINE, then i want it to give me an alert (Lets just say write it to a file).

That way I could run this on each of the storj machines instead of trying to do some centralized panel.

Any ideas? My bash skills suck.

Thanks in advance!

Pentium100 · October 22, 2020, 10:07am

Try this:

#!/bin/bash
lastPinged=`/usr/bin/curl -s "127.0.0.1:14002/api/sno/" | /usr/bin/jq ".lastPinged" | cut -d'"' -f2`
lastTS=`date --date "$lastPinged" +%s`
now=`date +%s`
lastContact=`expr $now - $lastTS`

echo $lastContact

It gives you how long ago (in seconds) was the last ping. If it’s longer than a few seconds then your node is offline.

I tested it on debian 10. If your OS is different, you may have to change the paths.

joesmoe · October 22, 2020, 10:11am

Thanks!!

Is it possible to put some sort of for loop, where it would look at docker ps and run this on the varying ports?

So like these are two examples of nodes running on one machine.

d6de00994941 storjlabs/storagenode:latest “/entrypoint” 9 minutes ago Up 9 minutes 0.0.0.0:14956->14002/tcp, 0.0.0.0:23876->28967/tcp storj-q156
3d411d8269d0 storjlabs/storagenode:latest “/entrypoint” 9 minutes ago Up 9 minutes 0.0.0.0:14955->14002/tcp, 0.0.0.0:23875->28967/tcp storj-q155

I would like the bash script to automatically load the port 14956 and 14955 in this case, then curl them…

Pentium100 · October 22, 2020, 11:27am

Is there a preference for the output format?
Or would something like:

14956: 0
14955: 1

be ok?

marckt · October 22, 2020, 12:28pm

port=28967
host="x.x.x.x"
for b in $host
do
a=$(nmap -v $b -Pn -p $port  |grep -e "tcp open"  |wc -l)
if [ $a -eq 0 ];
then
....
done

joesmoe · October 22, 2020, 1:51pm

I really just want a list of node names that are offline so i can pipe it into an email or a telegram message or something. Have it run every hour and get real alerts when things are offline.

oh and thanks for the help hah

joesmoe · October 22, 2020, 1:52pm

Thanks for the reply.

More interested in curl’ing the actual node’s API/RPC/wahtever ya call it. As i have ports open and such and sometimes the node still shows offline. Usually stopping, rm’ing, and restarting the docker instance fixess this, but i need some sort of alert for it.

Pentium100 · October 22, 2020, 2:08pm

Try this:

#!/bin/bash

ports=`docker ps | tr "," "\n" | grep 14002 | cut -d"-" -f1 | cut -d":" -f2`

for port in $ports; do

lastPinged=`/usr/bin/curl -s "127.0.0.1:$port/api/sno/" | /usr/bin/jq ".lastPinged" | cut -d'"' -f2`
lastTS=`date --date "$lastPinged" +%s`
now=`date +%s`
lastContact=`expr $now - $lastTS`

echo $port: $lastContact
done

I cannot test it fully because my VM has only one node.
It should display port number and how many seconds since last contact.

jeremyfritzen · October 22, 2020, 2:34pm

Why not using Uptime Robot?
It does the job pretty well. i’ve been using it for several months and it allowed me to handle downtimes quickly

joesmoe · October 22, 2020, 4:30pm

Thanks!

Thks is what I get:

bash check.sh

latest: 66497

“/entrypoint”: 66497

6: 66497

hours: 66497

ago: 66497

Up: 66497

6: 66497

hours: 66497

0.0.0.0: 66497

latest: 66497

“/entrypoint”: 66497

6: 66497

hours: 66497

ago: 66497

Up: 66497

6: 66497

hours: 66497

0.0.0.0: 66497

latest: 66497

“/entrypoint”: 66497

6: 66497

hours: 66497

ago: 66497

Up: 66497

6: 66497

Getting closer, i think those are the uptimes. So now i just need to find a way to echo the name of the docker instance that does not return an uptime (i.e. a node that is offline).

Please note that even though a node may show that it is up some hours, it may still not be online.

joesmoe · October 22, 2020, 4:31pm

As stated before, this is a very simple check that ensures only that the machine is pingable. It does not check each node running within a network or on a certain machine.

Pentium100 · October 22, 2020, 6:34pm

This is interesting, probably some difference between your system and mine

OK, can you post the outputs of these commands:
docker ps

docker ps | tr "," "\n" | grep 14002 | cut -d"-" -f1 | cut -d":" -f2

joesmoe · October 22, 2020, 6:35pm

latest “/entrypoint” 57 minutes ago Up 40 minutes 0.0.0.0
latest “/entrypoint” 8 hours ago Up 39 minutes 0.0.0.0
latest “/entrypoint” 8 hours ago Up 39 minutes 0.0.0.0
latest “/entrypoint” 8 hours ago Up 39 minutes 0.0.0.0
latest “/entrypoint” 8 hours ago Up 39 minutes 0.0.0.0
latest “/entrypoint” 8 hours ago Up 40 minutes 0.0.0.0
latest “/entrypoint” 8 hours ago Up 39 minutes 0.0.0.0
latest “/entrypoint” 8 hours ago Up 39 minutes 0.0.0.0
latest “/entrypoint” 8 hours ago Up 39 minutes 0.0.0.0

joesmoe · October 22, 2020, 6:36pm

And for example here is a full output of docker ps

ed47cf21a9d2 storjlabs/storagenode:latest “/entrypoint” 58 minutes ago Up 40 minutes 0.0.0.0:14960->14002/tcp, 0.0.0.0:23880->28967/tcp storj-q160

(well the first line of it at least)

Pentium100 · October 22, 2020, 6:39pm

OK, I know the problem now. I have debug port enabled so the port column has three values.

#!/bin/bash

ports=`docker ps | tr " " "\n" | grep 14002 | cut -d"-" -f1 | cut -d":" -f2`

for port in $ports; do

lastPinged=`/usr/bin/curl -s "127.0.0.1:$port/api/sno/" | /usr/bin/jq ".lastPinged" | cut -d'"' -f2`
lastTS=`date --date "$lastPinged" +%s`
now=`date +%s`
lastContact=`expr $now - $lastTS`

echo $port: $lastContact
done

one character was changed tr "," "\n" -> tr " " "\n"

joesmoe · October 22, 2020, 6:42pm

Great! That seems to be working.

Why am I getting such large values?

14960: 63738988884

Thanks alot p100!

Pentium100 · October 22, 2020, 6:43pm

I don’t know:
what does this give you?
/usr/bin/curl -s "127.0.0.1:14960/api/sno/" | /usr/bin/jq ".lastPinged"

joesmoe · October 22, 2020, 6:45pm

“0001-01-01T00:00:00Z”

Its also very strange because just pinging the api like this, seems to have brought nodes that were offline, back online?!

joesmoe · October 22, 2020, 6:47pm

The really long strings seem to correlate to offline nodes that display last contact as a long time ago (incorrectly) such as what I’m asking about here

Skyblockpro1 · October 22, 2020, 8:10pm

The length of time is actually the time to 1:1:1970 as that was when the computer time starts, In my experience the last contact doesn’t display the correct time i disregard it, i usually take from it that the node is offline and nothing more