Online issue with 1 sattelite

Hi!

I got this since yesterday:

Has north-europe gone mad?

I had a small “online” problem when my dog chewed the RJ45 cable. Turns out it’s not good for the connection… now I have 2 cables running to my node, and one of them is inside the wall… my dog can chew the wall, but it will take her more time…
Anyway, the offline status lasted for 20 min, or so…
us-central has been uploading more than 20GB per day to my node and doesn’t think I’ve been offline… west europe and us2 may have got it about right…
But europe-north went crazy… what can/should I do besides keeping the node online?

humbfig

1 Like

I don’t see anything to worry about.

well, you’re probably right… but doesn’t it seem a bit random, the way the satellites compute offline periods?

Every satellite is independent. But as the uptime is still in the 90ies it looks ok to me.

The online score is only affected when you get audits, so in the 20 minutes of downtime, you probably happened to get one or more audit requests from Europe North

ok. So, it’s a statistical “bad luck” with europe-north satellite…
But still, us2 has been constantly uploading to my node. For sure it couldn’t miss the connection going down. You mean, since it was not doing an audit, just uploading, it didn’t count as offline period?

No, they just independent on each other and frequency of audits depends on data on your node from the customers of that satellite.

Going down!!!

C’amon, shouldn’t logic apply?
Whos’s the mathematician at Storj?

If there’s no error in europe-north satellite, then, it can’t be as simple as random audits…

I can add, I was on 91.7% last week for this sat, today it down to 90% but node has been online - I only have 9GB allocated on this one (satellite, not on total storage) - I would assume on such small data I only get Audit once every few days, we sure the math is correct here on the 30 day window taking into account rate of audit vs storage vs availability - it dropped from 100% to 91.7% in like a 24hr period, then I thought it slowly should recover and instead it dropping.

node1-snap-feb-2021

#Edit : so just checked 30 day window from 2021-01-29 to 2021-02-27 and for Europe-north I had 10 audits on 9GB data. Sorry for bad charts, this not my good area and others more skilled than me.

I’m sure it working as intended, and I no where near 50% online, so no panics.

It’s late already, maybe I don’t get it:

Online: 90.91%
Online: 91.67%

I’d say it went up?

You can read details there:

Do you have graph from the uptimerobot.com?

Yes, it should, if you did not have a downtime (even short) again - the window is moved on, but a previous drop still in it and your node now added a little more, thus the average is dropped more. Every downtime will reset the count for the next 30 days online.

Thanks for reply Alexey :slight_smile:

I can only get graph for 24hrs :confused: I try looking all over place and can’t make it more. It interesting I have outage of 4 mins at 00:21hrs - that is during backup window, and I think Veeam causing a hang on node as it sync disks - I changed setting now to only freeze VM being backed up, as Storagenode was lumped in with a load of other Dev VM’s that not being used. So it is false for me to say no outage, it looks like while Storagenode thinks it online all time, there was a a blip when it failed to respond. But overall, it says for 30 days my uptime is 99.249% I not understand why only north say around 90% - #EDIT, check today and it now up to 90.91% :thinking: I not good enough at math to understand this.

Thanks me read them all :slight_smile: points below, would love to hear other SNO views on this.

Things that I query,
A - From implementation section, “Ideally even the least audited nodes should be audited multiple times over the course of a window.” - the assumption is window is 12hrs, although this can be set in satellite config (we not know what that is). From stats I seeing on North satellite, on 9GB data I seeing 10 audits in 30 days, so assumption in implementation incorrect for small SNO’s.

B- Section 3 of implementation under network, “If we decide not to attempt retries, we should adjust the offline threshold accordingly to account for offline false positives and ensure that even the smallest nodes are still audited enough that any false positives should not pose a real threat.” I think I might be seeing this, where north shards very small in comparison to total shards on network, so I get audited very little, and due to small downtime of 4 mins, get much bigger swing in online.

This probably not big thing at moment, but when we have 20k sno’s and data ingress at 128kbps a node (made up), new SNO’s will take long time to vet and fill up and could get caught by issue of dropping online, due to low audit, and low shard count making it hard to join network before DQ.

C.P.

The network will rebalance itself. If there would be low usage, we will see greater nodes churn, especially from newbie who do not patient enough (the node would come to the high league only after 9 months in the network).
If demand will increase, Operators would add more nodes.

ah… where do you get those graphics?

Yep, you’re right… :blush:
I saw it going down once, and when a new change happened I didn’t even look to the first two digits… my mistake, sorry…

1 Like

Hi, they came from free signup on https://uptimerobot.com/

Really good to monitor external IP and port of storage node - when you login to web portal, you can see graphics.

I use uptime robot also.
I meant the audit history graphics.

I’m having the same experience with europe-north too.
Had some downtime a few weeks ago so the online score dropped (to 85% or something) and the following days it rose slowly to 91-92%. Then it fell to 90%, then back up around 1% and down 1% and it’s been doing that ever since, although I’ve not had another downtime.
All and the other sats show >95% online, uptimerobot shows 98.5%.

The Europe-North-1 has not a lot of data, so audits are rare than from other satellites, so the online score is more sensitive than on the satellites with more data.