Receive email everytime node goes offline

So i have created a project on github that works for linux and sends alerts on email (via ssmtp) everytime node encounters errors
github link
Feel free to check it out i am open for any suggestions :wink:

8 Likes

Thanks for the contribution tachyontec!

3 Likes

You re welcome, I am also glad to be part of the project :wink:

2 Likes

I appreciate your contribution!

But I do have a few remarks. It seems this script emails you on errors not when the node is offline. That’s still useful, but you should probably know the following.

  • Older nodes have way more log lines than 40 per 10 minutes. You will be missing a lot of errors with the current setup. My node has over 100 log lines per minute.
  • Nodes with lots of traffic see frequent errors that aren’t really indicative of a real problem. In my case roughly 0.1% of downloads and 0.04% of uploads end in an error that isn’t serious. That gets really spammy.
  • The most serious errors that actually end up killing the process are logged as FATAL instead of ERROR. So you should definitely include those.

Hope that helps improve things!

6 Likes

Ok thanks for the help!
The 40 lines are for simplicity, this programm is gonna be running 24/7 in the node so personally i don’t want to look all logs between the 10 minutes. The think is:
→ If node goes offline there will be loads of errors which will be sent by email and operator will know what is happening. Simplicity and minimum effort is key here.

I could write a script that actually checks the time and takes all logs from the last log that was taken a snapshot from and check all of them for errors. But as you said this is a lot of logs…

Although to be honest i completely forgot the FATAL keyword so i will add this to in git

Anyways THANKS A LOT for your feedback i appreciate your help, not only for me but for everything you do for this community keep it up :wink:

3 Likes

Thank you for posting this!!!

1 Like

So,
As a continuation to what @BrightSilence advice i just commited some new changes to the repo and now main script checks all logs since the last check but i also did some modifications on my own main idea now is:
->Users are setting a update interval (how often they want to check if node has encountered errors)
->Because we are checking all logs since 20m we don’t check the same number of logs every time so user sets a percentage that if the errors are above this percentage, then he will receive an email
->Emails were really messy containing all errors in a plain text so before sending the full text with errors, some statistics are written and then each uniq error with the number represents how many times it was encountered

As alaways open for suggestions and thanks to everybody for your good comments :slight_smile:

5 Likes