suggestion shortform:, a health indicator in the dashboard which is shows 100% based on an estimate of log errors over time, nothing fancy, nothing complex, just a simple way to scan a full day worth of logs in the blink of an eye.
Suggested Extension for Detailed Log Occurrence Overview
When clicking the Log Score, a Log Overview Page is accessed
this page would contain a counter of individual types of log occurrences, with associated tips, fixes or hyperlinks to forum posts dealing with the particular log type in question.
This could be further extended into a collective online database of full totals of log type occurrences and a count of how many nodes are affected by / posting the individual log type occurrences, so that Storj Lab engineers and interested SNO’s can better problem solve and compare issues across many / all nodes.
rant and reasons below
I think most of us want to keep proper track of our storagenodes, but that’s far from easy…
as recently demonstrated by the whole orders.db debacle, a segment of SNO’s ran their storagenodes with errors, without noticing it,
an error that if quickly solved wouldn’t have put Storj Lab’s nor SNO’s in a difficult position, so to alleviate such issues in the future i suggest what is basically a log indicator…
A simple 100% indicator, changing depending on how many errors there is in a log, i know this isn’t very accurate and will require some tuning to work, but i’m fairly confident that with the correct exclusions and a very basic addition / subtraction model of cumulative errors vs no or low errors over time, lets say a day will work as a great indicator of node health.
initially went with a colored indicator, but i figured it would make much more sense to group it with the other 100% audit, suspension, uptime scores… by simply adding one called log.
P.S
initially had it configurable, color coded (loosely inspired by the colored log)
but after trying to write it down and thinking it through, i figured it would be better if all nodes ran on the same default, and adding configuration to it would make the scores subjective or just plain wrong.
so better to spend some time on making it work in the beginning and then it will just work “across the board for all time”.
and well the 100% was obvious when i thought about it…
haven’t added detailed descriptions of the suggested extensions, but i think the conceptual idea is well covered in the short form at the beginning of this suggestion post.