Better solution for monitoring than UptimeRobot?

Skyblockpro1 · July 9, 2021, 12:11pm

so I got it all up and ruing and this is the result:

it warns me if the port is closed or if the node is offline
or if the Zabbix agent isn’t available which means the system is unreachable

Alexey · July 10, 2021, 7:48am

The problem with failed audits due timeouts - the port is open, but the service is not responsive and cannot provide a piece for audit, because system become unstable and service just too slow to respond (more than 5 minutes).
I have no idea how to detect this except lacks of audits in the logs (usually you should receive not less than 1 audit per hour).

Maybe request the audits history by schedule?
See Node Online status - #4 by Alexey

Also - you can detect with port checker inside the network that service is still running, but not the answer on question - is it available from outside of your network (99.99% cases)?.

BrightSilence · July 11, 2021, 12:05am

With the exception of very new nodes which don’t have a lot of data yet, I would add.

twl · July 15, 2021, 6:39am

If that app will be Android exclusive, you’re gonna be in big trouble mister!

striker43 · July 15, 2021, 9:36am

Don’t worry, it will be Android and iOS
But at the moment I am too busy with my real job, so there is not a lot of progress unfortunately

SGC · July 16, 2021, 7:21am

I’m testing out a solution that should take care of most considerations and issues.
@striker43 i will need an app or such on the monitoring device to make it work.

cy2k · July 16, 2021, 2:36pm

Can’t wait to hear more about what you’ve got in the works! Thanks, @SGC

hashbackup · November 7, 2021, 4:08pm

I have used uptimerobot.com for a long time to monitor the HashBackup upgrade server.

I was thinking, instead of just monitoring whether a port was open, Storj’s node software could have a special https endpoint for health checking and UptimeRobot’s keyword feature could be used to check the results:

Keyword monitoring

Use keyword monitoring to check presence or absence of specific text in the request’s response body (typically HTML or JSON).

So when UTR fetches the /status URL, the node software does various health checks like:

write a dummy file in the blob directory and do an fsync(), make sure it doesn’t take “too long”, whatever that may mean (maybe a config option)
delete the dummy file
open any important SQLite databases and do a simple query to make sure it works to detect “database disk image is malformed” problems

If all these checks succeed, the endpoint returns a success keyword or JSON doc. If any fail, the endpoint returns a JSON doc with all of the errors encountered. UTR probably can’t do anything useful with that error doc, but maybe future monitoring software could.

Alexey · November 7, 2021, 6:33pm

I would waiting for your PR with hope

stor-toq · November 8, 2021, 2:54pm

I was just down for 2 days with no idea things weren’t working. Thank you for the suggestion of uptime robot, just got that set up and hope it works.

Alastran · May 28, 2022, 11:43am

I am also now looking for another monitoring service to replace this. Colleagues recommend this monitoring service called HostTracker. Has anyone used them? Here, as if loyal prices and the tariff grid is designed for different needs.

NoNodesense · February 15, 2023, 5:27am

Have you made any progress?
I’ve heard of NEMS Linux core for local monitoring; however, I am not sure this is the correct application. I think it just pings the system. what are some ways to monitor the docker container instead of/in addition to the host ??

Alexey · February 15, 2023, 5:46am

The main point to monitor an availability of your node from outside your network, because inside the network it may run just fine, but your ISP has connectivity issues or your external address got changed, or ISP decided to place you behind their NAT (CGNAT), or your router is overloaded and started to drop packets, etc.

MattJE96011 · February 15, 2023, 9:21am

Snag a cheap VPS somewheres and install. Supports a whole range of monitoring options including TCP port and docker containers. Supports all sorts of notification pathways. Monitor as many things as your VPS can handle.

NoNodesense · February 15, 2023, 1:55pm

That reminds me, Network Chuck mentioned this in a YouTube video recently. I will check them out!

lyoth · February 16, 2023, 3:17am

Why not use prometheus, storj-exporter and grafana?

I have this setup and alerts me on discord whenever one of my node is down, or is up but not getting traffic.

Alexey · February 16, 2023, 3:32am

Specifically

The running node is not the same as externally accessible node.

lyoth · February 16, 2023, 5:01am

Grafana gives you a free grafana cloud instance that is hosted by them for free.
Then I expose my Prometheus instance over https with username and password authentication hosted on my computer. Then the grafana cloud connects to my Prometheus hosted on my computer. If grafana is not able to communicate to my Prometheus, one of my alert ie node down or node no traffic will alert me on discord.

The grafana cloud instance runs supports Prometheus out of the box, so you can push storj-exporter directly up to Prometheus hosted on grafana. I didn’t go with this method since it only keep metrics for 30 days rolling windows, and I wanted to keep a year worth of metric.

mdmeyerpfa · July 20, 2024, 12:35am

Just stated using Uptimerobot. Extremely easy to set up for all my nodes and provides a nice dashboard. However it does not address storage service down or hung. Had this happen to one of my nodes in the middle of last night. Received Storj email from all 4 satellites saying node was down while Uptimerobot showed all green status bars when I check in the morning. Still better than nothing.

ACarneiro · July 20, 2024, 8:59am

Which port are you monitoring?
I have never had uptimerobot fail to notify me of a down node (although it can sometimes take a bit longer than it was supposed to)