My uptime should be 100% on all satellites i have not gotten any uptime robot notifications of downtime in months

sorry2xs · July 22, 2021, 6:02pm

i am glad am not the only one with this issue my uptime should be 100% on all satellites i have not gotten any uptime robot notifications of downtime in months and this machine only does one storagenode 25 TB.

Alexey · July 23, 2021, 1:37pm

Would you mind to provide dashboard from the uptimerobot?

And also, please, provide result of the command:

PowerShell

foreach ($item in ((Invoke-WebRequest http://localhost:14002/api/sno).Content | ConvertFrom-Json).satellites.id) {
    ((Invoke-WebRequest http://localhost:14002/api/sno/satellite/$item).Content | ConvertFrom-Json).auditHistory.windows |
        Where-Object{$_.totalCount -ne $_.onlineCount} | ForEach-Object{Write-Host $item; $_ | Format-Table -AutoSize}
}

bash

for item in `curl -sL http://localhost:14002/api/sno | jq '.satellites[].id' -r`; do
    curl -s http://localhost:14002/api/sno/satellite/$item | \
    jq '{id: .id, auditHistory: [.auditHistory.windows[] | select(.totalCount != .onlineCount)]}'
done

See GitHub - AlexeyALeonov/online-score

sorry2xs · July 23, 2021, 6:02pm

Alexey · July 23, 2021, 6:27pm

As you can see, your uptime is 99.989% for the last 7 days.
It’s pretty match to what you see on the dashboard.
You also can check what exact days and how long your node was offline on the uptimerobot site.
With provided scripts you can check this information with your storagenode too (but only roughly - they would show your dates when that happened, not the exact time and not how long, but you can divide 12 hours interval on number of audits and multiply to number of success checks, you will get a rought estimation regarding time since the timestamp in the report).

SGC · July 24, 2021, 8:56am

it’s very difficult to keep it at 100% even updating takes time…
and there are only 720 hours in a month… so even 7 minutes of downtime will be 0.01% loss of the 100.00%

on top of that you get 2 updates every month giving you less than 3 minutes of DT for each to keep at like 100% not sure when it rounds up from like 99.95% or whatever…

then because uptime is measured by audits, storagenodes with less data will have higher granularity in the estimated uptime… because thats what it is … and estimate…

if the node has enough data it will be highly accurate to within like a few minutes…
if the node has limited data then you might lose 5 - 10% or even more just for random chance by getting an audit when the node is down and because its rare to get audits that breaks a larger time frame… window or whatever its called…

so best you can possibly hope for is 99.99% and then it will be rounded up to 100%
else random chance might take you to 100% … .but it’s really a near impossible state…

realistic… 99.95% online score and you should be very happy… if your online score keeps dropping you have a problem… if it keeps around 90-100% you most likely don’t have any issue aside from regular maybe instability issues… even a single reboot a month will set you back but a bit… and on some servers one might not even notice its in a cycle of crashing and reboot

but 97% isn’t bad… basically just means you rebooted if its smaller nodes… some will have 100% others will have 97% or whatever… quite random… with limited stored data for a sat

sorry2xs · July 24, 2021, 4:00pm

well here is a new screen shot of what the dashbord is displaying, now saltlake sat. has always been a poor performer.

Alexey · July 24, 2021, 4:30pm

Please, read How is the online score calculated? - Node Operator to understand, how it works.

zeroheat · January 13, 2023, 11:55am

Hello

I had an issue with my NW provider around Christmas, where I was left without internet for almost 1 day. However, as you can see from the screenshot below at the moment my percentages are still in RED and NOT at 100%. I noticed this after some of the updates, but I don’t remember at which version. Those percentages are like hung for weeks. Do i have to restart the node for the % to refresh?

Stob · January 13, 2023, 12:04pm

No

TLDR - Online score is for 30 days. Keep online for 30 days and the score will return to 100%.

zeroheat · January 13, 2023, 2:40pm

OK but i have been up for more than 2 weeks now and those percentages haven’t moved at all. previously after 3 4 days of being online they cleared out or at least started going up towards the 100%. Something was changed or I am mistaking ?

Stob · January 13, 2023, 2:54pm

One single offline event affects the Online score for a full 30 days. There is no way around this fact.

The Audit and Suspension scores have shorter lives and are dynamic, so they can change by the minute if an audit failure occurs but is then followed by multiple successful audits.

zeroheat · January 16, 2023, 6:31am

OK, I understand. But this did not work like that. Something changed. That is what i am asking. What was the change?
I am a StorJ operator for more than 3 years an i have these drops of my internet connection at least 3 or 4 times per year. So I am 100% sure that the program did not react the same way in the past compared to right now.

Alexey · January 16, 2023, 8:29am

You can check when your node was offline with these scripts

And when 30 days will pass from the last downtime, it should recover.

zeroheat · January 16, 2023, 9:27am

OK, here is the output:

PS C:\Windows\system32> foreach ($item in ((Invoke-WebRequest http://localhost:14002/api/sno).Content | ConvertFrom-Json).satellites.id) {

((Invoke-WebRequest http://localhost:14002/api/sno/satellite/$item).Content | ConvertFrom-Json).auditHistory.windows |
    Where-Object{$_.totalCount -ne $_.onlineCount} | ForEach-Object{Write-Host $item; $_ | Format-Table -AutoSize}

}
12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo

windowStart totalCount onlineCount ----------- ---------- ----------- 2022-12-15T00:00:00Z 15 14
12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo