Thanks for reply Alexey
I can only get graph for 24hrs I try looking all over place and can’t make it more. It interesting I have outage of 4 mins at 00:21hrs - that is during backup window, and I think Veeam causing a hang on node as it sync disks - I changed setting now to only freeze VM being backed up, as Storagenode was lumped in with a load of other Dev VM’s that not being used. So it is false for me to say no outage, it looks like while Storagenode thinks it online all time, there was a a blip when it failed to respond. But overall, it says for 30 days my uptime is 99.249% I not understand why only north say around 90% - #EDIT, check today and it now up to 90.91% I not good enough at math to understand this.
Thanks me read them all points below, would love to hear other SNO views on this.
Things that I query,
A - From implementation section, “Ideally even the least audited nodes should be audited multiple times over the course of a window.” - the assumption is window is 12hrs, although this can be set in satellite config (we not know what that is). From stats I seeing on North satellite, on 9GB data I seeing 10 audits in 30 days, so assumption in implementation incorrect for small SNO’s.
B- Section 3 of implementation under network, “If we decide not to attempt retries, we should adjust the offline threshold accordingly to account for offline false positives and ensure that even the smallest nodes are still audited enough that any false positives should not pose a real threat.” I think I might be seeing this, where north shards very small in comparison to total shards on network, so I get audited very little, and due to small downtime of 4 mins, get much bigger swing in online.
This probably not big thing at moment, but when we have 20k sno’s and data ingress at 128kbps a node (made up), new SNO’s will take long time to vet and fill up and could get caught by issue of dropping online, due to low audit, and low shard count making it hard to join network before DQ.