Online stats: quick to drop, not so quick to recover

Good evening!

I had a few hours of downtime this month due to internet provider change (from vDSL to 1GBit/s optic fiber). Online stats immediately dropped to the value shown below (before the outage I was 100% on every satellite). Now it’s about 1 month I changed provider, I had only another downtime of 2 hours last week to finalize the internet provider change and the values are stuck to the percentage I got after just few hours of downtime.

So the question is: how many hours must I stay online to recover, let’s say, 1 hour of downtime?

Which is the formula behind the online numbers? Why after about 10 days of uptime the numbers didn’t increase even by 0.01%? (I know they’re rounded, but it’s still a long time without any change)

Thanks!
Pietro

2 Likes

you have a moving window of 30 days for the uptime. So it can’t recover while the day with downtime is within that window.

4 Likes

What @kevink said :slight_smile:
An offline event recovers 30 days later.

In other words, if a node has only one disconnection for several hours for instance, that will hit the score immediately (e.g. let’s say 98.20%), and this score is gonna stay at 98.20% for 30 days, then will recover all of a sudden to 100%.

1 Like

I have not yet read the “Blueprint: tracking downtime with audits” but STORJ Team must reconsider using another method to measure storage node downtime!

Why? It would help to give at least an argument.

Many thanks to all for the clear explanation, I wasn’t able to find that in the docs, my fault.

Current method is actually the “other method”. The first implementation of downtime tracking was buggy. Please share if you have any suggestion to an alternative method.

I’m a noob, I’m don’t have too much knowledge in programming, crypto, etc. I setup a node because its a good way to earn a passive income if done correctly! I don’t have a static ip, I have a dynamic ip and if I’m not making any mistake everytime my ip is changing, NO-IP software is taking time to recognize the new ip change. And this few seconds or minutes where my ip is changing, the satellite is considering that my node is offline.

Most routers have a DynDns feature builtin where no-ip (or other services) can be configured.
This usually works better as the router notifies the DynDns service whenever the IP changes.

You should have a look at your router’s administration interface to see if it can handle that. If it does, you should probably replace the no-ip update software by using your router’s DynDns feature instead.

1 Like

Regarding dynamic DNS, my router, apart from usual known services like no-ip, dyndns and others, has a very nice feature: custom dynamic DNS! You can enter a custom URL which will be called when IP changes with all needed parameters (i.e. the new IP and other configurable settings).

So I bought a Google Domain for €12/year and I configured my router to call a Google Domain API which updates my DNS entry. It’s very quick, in less than a minute the change is propagated to other name servers, despite Google says it could take up to 48 hours.

Obviously, you must use a router which provides custom dynamic DNS update API calls.

2 Likes

Downtime tracking is based on audits, and those audits are grouped into 12 hour windows. Although there is a 30 day window to determine suspension and disqualification (unimplemented right now, but will be turned on in the future), you do not need to wait 30 days for your online score to recover. To answer your specific question, every window in the 30 day tracking period has equal weight. So if you have an online score of 0.5 for one 12 hour window, that will be averaged with all the other 12 hour windows in the 30 day tracking period. After 30 days passes, that 0.5 score window will disappear and will no longer affect online score.

That said, with the online scores in your screenshots, you should be more than okay and do not have to worry about penalization once we enable it. We haven’t determined the exact threshold, but >90% will be fine for sure.

I am happy to explain more, but if you’ve gotten this far and are interested in learning more, you may as well read the blueprint that @nerdatwork linked above, as it will explain things more exhaustively.

EDIT:
As a followup (since it appears this post has been linked elsewhere), we are actually currently in the process of implementing a SNO change which will make audit downtime tracking windows accessible from the SNO API. This will allow node operators to see specifics about how many times they were audited in a given window as well as how many of those audits they were online for, for the full 30 day tracking period.

3 Likes

Current version of the blueprint (I don’t know how up-to-date the content of the post above is): https://github.com/storj/storj/blob/3fc76f4ffeb94385ebfa462de888511a84609789/docs/blueprints/storage-node-downtime-tracking-with-audits.md

Many thanks! Actually I was not scared about being suspended: after I moved to a more performant hw, included a CMR disk, and after I switched to 1GB optic fiber connection, I had no errors or warning at all, my log is completely clean.

I was only wondering whether the online recovery mechanism was working or not.

Ok cool. And yes, your online score should change, but as mentioned above, you will need to wait 30 days before the “bad” windows will shift out of scope, so it could be that long before you notice an improvement (assuming you had/have perfect uptime outside of your 2 hours of downtime).

OK, thanks. No problem for the 30 days window, I’ve been a SNO for more than one year now, being a Storj node operator is a long-term commitment.

It’s been several months I don’t see any error in the log, so I’m pretty confident I’ll recover my online stats.

Well, provided my new provider doesn’t decide to factory reset my router without warning me, as they did last time. I have a VM in the cloud which send me a message if my node is offline, so this time I was able to recover quite soon after I got the downtime message.