Better solution for monitoring than UptimeRobot?

so I got it all up and ruing and this is the result:

it warns me if the port is closed or if the node is offline
or if the Zabbix agent isn’t available which means the system is unreachable

The problem with failed audits due timeouts - the port is open, but the service is not responsive and cannot provide a piece for audit, because system become unstable and service just too slow to respond (more than 5 minutes).
I have no idea how to detect this except lacks of audits in the logs (usually you should receive not less than 1 audit per hour).

Maybe request the audits history by schedule?
See Node Online status - #4 by Alexey

Also - you can detect with port checker inside the network that service is still running, but not the answer on question - is it available from outside of your network (99.99% cases)?.

2 Likes

With the exception of very new nodes which don’t have a lot of data yet, I would add.

3 Likes

If that app will be Android exclusive, you’re gonna be in big trouble mister! :see_no_evil:

1 Like

Don’t worry, it will be Android and iOS :wink:
But at the moment I am too busy with my real job, so there is not a lot of progress unfortunately :neutral_face:

2 Likes

I’m testing out a solution that should take care of most considerations and issues.
@striker43 i will need an app or such on the monitoring device to make it work.

1 Like

Can’t wait to hear more about what you’ve got in the works! Thanks, @SGC

1 Like

I have used uptimerobot.com for a long time to monitor the HashBackup upgrade server.

I was thinking, instead of just monitoring whether a port was open, Storj’s node software could have a special https endpoint for health checking and UptimeRobot’s keyword feature could be used to check the results:

Keyword monitoring

Use keyword monitoring to check presence or absence of specific text in the request’s response body (typically HTML or JSON).

So when UTR fetches the /status URL, the node software does various health checks like:

  • write a dummy file in the blob directory and do an fsync(), make sure it doesn’t take “too long”, whatever that may mean (maybe a config option)
  • delete the dummy file
  • open any important SQLite databases and do a simple query to make sure it works to detect “database disk image is malformed” problems

If all these checks succeed, the endpoint returns a success keyword or JSON doc. If any fail, the endpoint returns a JSON doc with all of the errors encountered. UTR probably can’t do anything useful with that error doc, but maybe future monitoring software could.

3 Likes

I would waiting for your PR with hope :slight_smile:

2 Likes

I was just down for 2 days with no idea things weren’t working. Thank you for the suggestion of uptime robot, just got that set up and hope it works.

I am also now looking for another monitoring service to replace this. Colleagues recommend this monitoring service called HostTracker. Has anyone used them? Here, as if loyal prices and the tariff grid is designed for different needs.

Have you made any progress?
I’ve heard of NEMS Linux core for local monitoring; however, I am not sure this is the correct application. I think it just pings the system. what are some ways to monitor the docker container instead of/in addition to the host ??

The main point to monitor an availability of your node from outside your network, because inside the network it may run just fine, but your ISP has connectivity issues or your external address got changed, or ISP decided to place you behind their NAT (CGNAT), or your router is overloaded and started to drop packets, etc.

3 Likes

Snag a cheap VPS somewheres and install. Supports a whole range of monitoring options including TCP port and docker containers. Supports all sorts of notification pathways. Monitor as many things as your VPS can handle.

2 Likes

That reminds me, Network Chuck mentioned this in a YouTube video recently. I will check them out!

Why not use prometheus, storj-exporter and grafana?

I have this setup and alerts me on discord whenever one of my node is down, or is up but not getting traffic.

Specifically

The running node is not the same as externally accessible node.

Grafana gives you a free grafana cloud instance that is hosted by them for free.
Then I expose my Prometheus instance over https with username and password authentication hosted on my computer. Then the grafana cloud connects to my Prometheus hosted on my computer. If grafana is not able to communicate to my Prometheus, one of my alert ie node down or node no traffic will alert me on discord.

The grafana cloud instance runs supports Prometheus out of the box, so you can push storj-exporter directly up to Prometheus hosted on grafana. I didn’t go with this method since it only keep metrics for 30 days rolling windows, and I wanted to keep a year worth of metric.

2 Likes