Notifications about offline/suspended nodes

This is a complaint/feature request, not a plea for help. I’ll find the proper place for a feature request when I’m off work.

6 days ago my storj nodes went offline. Having been busy with various non-computer activities, sister graduating HS, various lawn/yard cleanup projects, etc, I hadn’t checked the local webconsole for the storj nodes in a while.

Last night while setting up a new firewall, I happened to open up the webportal and found that both nodes had been suspended for being offline too long. On the local web console was a long string of notifications stating that the node was offline and that it “was going to be” and then was suspended.
I rebooted the server, cleared the docker images and relaunched them. Everything came back online, albeit still suspended.

2 hours later, after fixing the issue I received a small flood of emails notifying me that my nodes had been offline for 6 days and were now suspended.
I don’t want to homebrew/hackjob a custom monitoring solution to monitor log files looking for storj errors. I’ve got enough other stuff to worry about and troubleshoot between work, family, and various hobbies.

What I’d like to see is a simple “node offline” notification service, offered by storj, that automatically emails you when a node is offline and fails an online check.

  • i80
1 Like

How about Grafana? I have a grafana dashboard with email alerts.

1 Like

I suspect that counts as “homebrew”.

uptimerobot.com that isnt homebrew it can alert your phone directly. It has saved me alot of time and headaches many times even at 3am in the morning.

3 Likes

How hard is it to add a watchdog that emails you when the node goes offline? I’m baffled as to why it’s not there already.

1 Like

An excellent question, setting up a separate server with a watchdog service wasn’t high on my todo list… I was under the impression that the built-in storj online/offline monitoring notifications were something I could rely on. I didn’t realize that it was going to hold the “hey this node is offline” notification until after the node was back online.