How to know if SN has issues?

With all the new code to prevent the SN from coming online if issues are detected etc, what are SNO’s doing to know they have an issue now days???

I have uptimerobot monitoring my SN but (in my understanding) that is just “pinging” the port number and doesn’t actually know if there is an issue with the SN. So if the SN fails to start with the new code releases, is uptimerobot still good enough to know that I need to remote in and check things out?

Does the SN or Docker not open the port if an issue is identified? Allowing uptimerobot to alert me or is the port opened pretty early in the process and if there is an issued identified I may never know (or at least in a timely manor).
Also assuming that if an error is detected with the new code, that the dashboard will show offline?

From my understand about uptimetobot and ping port.

If your SN couldn’t start properly, uptimerobot will not be able to ping the port of SN and alert you to know that something happen.

We are still in beta so you would have to check logs for any errors and use scripts to check for audit failures. All the features you mentioned about getting alerts and errors being displayed on dashboard will be implemented :soon:

Uptimerobot will only notify you (if you choose to be notified) if your port is inaccessible. For any specific issue you can always check forum for the same.

Then your storj port won’t appear “open” to uptimerobot, so yes, it will detect this kind of problems.

You can test this by simple stopping the storagenode windows service - within 5 minutes, uptimerobot will notify you.

You can’t ping a port I think. Uptimerobot checks if the port is open. To be opened, the service needs to be running, if it’s running it should be ok. But this is my understanding, correct me if I’m wrong.

2 Likes

That what I already wrote regarding uptimerobot.

But as @nerdatwork said
It’s only notify if SN port are inaccessible other issue that happen to the node and make node not operate properly (in case of the port’s connection part still work) will have to check from the log.

1 Like

So all the error checking happens before the docker/SN opens the port which would trigger the uptimerobot alert.
From there, currently, I would need to check logs for errors. Being that the errors just started and the SN is offline the errors should be easily found at the bottom of the logs. So a tail 50 should be enough log entries to find the problem.
Then, hopefully soon, the dashboard will replace the need to go to the logs to identify the problem (stilll need them for details)?

This is not really relevant, if the network has a problem how can it notify you that there is a problem? Can you call the phone company if your phone is down?

Just monitor the logs in ELK or something else and you can make your own conclusions. But in reality does it matter? if the network is down there is no shit you can do about it.