besides Grafana in place, can you advice some kind of KPIs to watch for - something like HDD health, online/offline status, i/o status, update status (latest version) etc.
I would like to setup some kind of alerting system to get notified, if something, especially HDD health is something I should care about and fix, before the node gets disqualified.
If there’s already something in place or someone has setup something similar, please let me know.