My node is running 24/7 but the uptime is decreasing slowly!

SGC · November 25, 2020, 8:50am

that’s a good point… no matter if i got internet or not, wouldn’t i want to be notified… it’s kinda the whole point… i suppose one could make it so that it utilizes multiple people instead, that would make the range requirements less important and then it would have admin redundancy… lol

it isn’t enterprise before people are redundant…

i would still want to be notified tho… tho if one is redundant it would be nice if it won’t wake me a 3 am
ofc running the sms service costs about 10%+ of an internet connection… and if one had a truly redundant internet connection… not just hooked up in the same fiber junction box then it becomes not required because it could be done locally…

and just because internet maybe spotty or not working, doesn’t mean that it won’t work nearby

Pac · November 25, 2020, 1:19pm

I’ll be honest, that made me laugh

This said, situations where you’ve barely enough network signal to still be able to receive an SMS, but not enough to have a data connection, are pretty rare… So free version of UptimeRobot is still perfect for most people and 99% cases, as the e-mail notification is pretty much like an SMS… in my humble opinion

SGC · November 25, 2020, 4:18pm

in many cases this is very true… but i do live in the country, and when you get into the less populated areas then getting 3G or 4G usually is the first thing to go… it’s simply a range thing… i don’t think even the newer data networks have the same range as the GSM network… but maybe… it’s been a while since i looked into all that… you could be right…

i just like the solution to be made to solve the issue in the best way possible and i think sms notification is the way to go for getting the best signal and the longest battery life on the receiving end.

but yeah if i lived near a big city it might not be a big issue… or if there wasn’t many hills and forests around… i might totally agree… but i live much more rural

NS1 · November 25, 2020, 4:20pm

I just wanted to precise that since I started the node last January, I never had this issue. Its on October when I got the unsent order issue that I noticed the uptime is becoming extremely sensitive.

NS1 · November 25, 2020, 4:23pm

How this discussion escalated from uptime to SMS ,3G, 4G, GSM network, etc?

SGC · November 25, 2020, 4:45pm

methods of tracking downtime, but yeah right back on track…

is your online score still dropping…?

i did a few tests on my own nodes, doing a reboot and shutting them down for several minutes, i think i checked it after i had an extended shutdown for 5-10minutes, which didn’t seem to affect the online score and then a reboot afterwards my nodes are still not greatly affected by it…

tho the 2nd smallest node was the most affected by it. having 300gb stored and ended up at 99.7% which is much lower than my 14.6TB node now at 99.97%-99.93%

and thats with like 20 minutes of downtime spread out over 3 times… so you must have had a fair amount of downtime, which is continually being added if the score keeps going down.

like alexey says it should drop 40% with 288hr of dt, ofc depending on node size the uptime score will be much more granular due to it being based on audits

my smallest 100gb node didn’t even get an audit while being offline for maybe 12-15minute in total so it’s still at 100%, meaning it wasn’t audited while being offline and thus didn’t decrease in online score.

you sure your internet connection is stable and you aren’t having maybe issues with maybe a ddns service or something… or your internet has to be unstable, the only question is how and why…

NS1 · November 25, 2020, 9:24pm

Is there any software I can use to check if my internet is stable?

Alexey · November 25, 2020, 9:37pm

The uptimerobot.com
You will have a pretty nice graphs

SGC · November 25, 2020, 10:36pm

found it… ill repeat myself from the rambling below, don’t… DON’T ping any server 1000 times in 35ms like suggested in the network pinger… i think that was the last one i used… not sure…
enjoy
https://www.addictivetips.com/net-admin/best-ping-tools/

ramblings and reasons - venture at the risk of your sanity.
there are a wide range from tons of different companies, often years go by between i use them so i end up using a new one each time, they basically just ping an online stable server or ping your ddns name to resolve your ip address and then log it…

most often i just end up using a basic ping and let it run for a few hours maybe a day or two, which is usually more than enough to get an idea of what the problem is… ofc one has to pick a stable online server and one that doesn’t mind, because it does eat a bit into their network iops to be pinging them every second or more, i usually just use the default, see if it drops packets at a concerning rate.

@Alexey doesn’t uptime robot only check every 5 minutes or so? i seem to remember somebody saying that… but haven’t used it so pretty clue less

@NS1
i usually just ping some semi local server… i suppose sustained ping doesn’t take as much network resources as it did once… so maybe you can get away with google.com… but you also want and accurate ping that is the true speed and google is pretty high traffic but also usually located nearby …

looked for some good options for software, but found mostly cloud based online platforms, which will rarely do the network testing to the degree of accuracy one might want for storj especially since the online score went live and can be insanely accurate, my node i calculated got a audit about every minute on an avg day with in the last couple of weeks.

and that would mean busy days might be 5-10 times that, taking it into the second ranges or so i would expect … and i think i got a lot more audits in the past during high traffic, but not 100% sure.

thus verifying that the connection is stable at those time scales can be difficult without sending packets every second or so…

can’t find anything useful pretty sure i found a nice freeware one last time that would allow me to ping 8 different servers every second and keep track of it for days… looked through what stuff i have but cannot seem to find it… maybe it wasn’t as good as i remember and it got deleted.

i really like just using ping or ping -t if you are on windows
then ctl+c to stop i on either linux, freebsd and windows ,i think…
and should give you a brief description of how many failed, but people don’t like getting their server spammed with ping’s ofc
so maybe thats why it’s more difficult to find these days, or the cloud platforms are just better… stuff like it always seems to change so much every few years, i almost stopped saving the programs…

does really annoy me i cannot find a good one…

finally dug up the right name
ping tools or ping utility

i think i used something like the one named network pinger the last time, and maybe the 1500$ nr 1 14day trial version, but those types tend to require a manual to use often…

and don’t do as they suggest in the notes of network pinger and send 1000 pings in 35ms continuously
thats like cyber terror or whatever… but i think every second or two is acceptable, i cannot assure you the other side agree’s tho, so be mindful of that… also slower settings may work for discovering a problem, but when in the seconds range you can see much more clearly if stuff works or not…

some packets will be dropped from time to time, i think the avg is like 0.25% so like 1 in 400
is normal… but it might not show up at all which means it works perfectly… and then if you get high variation in latency… the time it takes for the network packet to go out hit the server and for a response to come back… might just be one way time… i forget… not really important…

ping latency should be 10-15 sec locally on fiber, maybe 45-50 on DSL cable type stuff locally / nationally… and then across the atlantic its like 250ms to maybe 450ms maybe even worse…

usually microsoft.com, google.com stuff like that might work… but you will need a consistent low ping for it to be useful… and low is relative to your connection… but shouldn’t be worse than 100ms but the other end needs to be able to respond unencumbered also… pinging a slow server will get you a slow response…

if i ping google.com i get 15ms which is about what i would expect… really the internet tech and your geographical location can affect that a ton…
well enough rambling, hope you find it useful…
ill move the link to the top for convenience, in case you don’t make it this far…

congrats you made it…

baker · November 26, 2020, 2:03am

You could check your router/modem logs to see your IP is changing a lot. Or for other connection problems.

Pac · November 26, 2020, 7:21am

The free version of Uptimerobot does ping every 5 minutes and can’t be configured to ping more frequently than that, unless one upgrades to their premium service.

As a side note, it can be configured to ping less frequently than that though. My nodes get pinged every 20 minutes, I thought it was enough…

SGC · November 26, 2020, 8:00am

its for monitoring downtime… not for identifying / diagnosing network problems.
so it’s not really required to be a good tool for what it’s made for… and like i said… the regular ping can be enough for most testing, rarely i dig other stuff up…

it is… ping doesn’t require much of anything, but the frequency would ofc define the granularity of your data… if you ping the server or uptime robot ping the server every 20 minutes, if we do 5 minutes of downtime… the odds of it being noticed is rather small, since there are literally only 3 minutes in time that is checked over the hour… if we say 16:00,16:20,16:40
so as long as we don’t hit those with our 5 minutes of dt the uptime robot wouldn’t be the wiser…

ofc your uptime score is much more accurate than uptime robot can ever be.
and if the server or internet goes down and stays down 20 minutes will be fine…but it will certainly not tell you of any network issues.

the reason it’s so slow is the same reason i told NS1 not to send 1000 pings in 35ms
pings take up network iops and tho there is a lot of them, there are not unlimited capacity, it’s very closely related to DoS attacks, when you basically have thousands of computers ping or access one site to bring it down, because it can only answer so many requests at once, before it starts dumping them, and when it does it will be dumping thousands and anyone else that sends a request of any kind will most likely be dumped with the storm of pings.

so pings below a certain frequency over extended periods are not looked fondly upon, but without an accurate granularity in the data of a connection, like say with a ping every second… it can be very difficult to see if the connection is truly stable, because like the previous example… at even 30sec checks i could pull a network cable for 10sec and plug it in again and odds are like 1/10 for noticing it.

so my go to solution is just to start out with running a continuous ping every second or so, maybe less if i know there is something wrong and trying to see accurately what is happening, but all connections will have problems the more details one goes into… if its stable every second then usually in my experience it’s stable enough for whatever usage… ofc today with fiber 15ms national times sort of does that the granularity that was acceptable in the past might not be acceptable anymore…

but i’m sure it’s fine, so long as the ping is stable and for identifying if there is a problem or not, then one can do more accurate scans of the connection afterwards.

i would set it to 5 minute, because there isn’t really any good reason not to.
i think the paid version goes down to 1minute which is much more useful for basic diagnostics… but ofc they want you to pay… so that makes good sense

ping is free heheh and default on all operating systems.

SGC · November 26, 2020, 6:07pm

Conclusion
there seems to be a 12 hour latency on the online score, so while it may look like it is dropping the connection may already have been stable for better part of a day or so… more on why i say that below

Ramblings and reasons

noticed today that my online score is now dropped significantly since yesterday, even tho i’m confident i’ve been online, there seems to be good deal of hours of latency between when the initial offline event happens until the online score reflects it…

something to keep in mind…
also the numbers are kinda weird… but i guess thats just down to random chance and audit uptime check granularity or something… looks a bit like it might need some fine tuning, but to little data to say, but it sure does seem to indicate it that.

main node 14.6 TB

secondary node 303 GB
here we see the granularity showing up because some of the satellites didn’t notice it was gone, but it also did have slightly less downtime… all in all maybe 20minutes over 3 times i think it was, maybe 5 minutes more or less for the main one

and the last node the satellites didn’t notice was offline…
tho as my score on the others is still dropping, more may roll in later…
total uptime is now 32 hours and last update was 20hours ago… so the latency in the online score dropping from when the actual downtime events happened was well over 12 -14 hours most likely more…

which was kinda the point i wanted to make with this post… trying to keep it brief … sorry
figure it might be a little useful atleast

oleg · November 28, 2020, 10:40am

I observe the same thing.
During continuous operation, the online indicator continues to decrease. My guess is that http: // localhost: 14002 / v1.16.1 is not working properly.

oleg · November 28, 2020, 10:46am

if the ip address is permanent?

NS1 · November 28, 2020, 12:09pm

Since I started my node on January, I never had such problem. Its on October since I got the unsent order issue that I start getting this problem. Is Storj Team putting the blame on SNOs internet connection whereas the problem is coming from the software?

SGC · November 28, 2020, 12:28pm

i know my connection is rock solid and that i have caused this online score decrease.
and even after 40-60 hours saltlake still dropped a hair more in online score…
so clearly the delay from the offline time to it being fully registered as online score can be extensive.
multiple days if not more.

so even if your scores are dropping, it may still only be from one single or a few events in a short period… all mine was less than 24 hours apart might even be closer to 12 hours.

but if your internet connection is stable the numbers should stop dropping and then eventually start going up again… but that may take like a month, but we will see it was partly why i wanted to test… tho i may have more offline time coming before xmas because i am switch out my ISP, so if it takes a month for online score to correct back to 100% or whatever it does… then it will take me atleast until the middle of January to test it out.

so yeah, so long as your numbers doesn’t drop dramatically i wouldn’t worry about it…
i’m also unaware if the watchtower update would affect the online score…

i mean first time i tried going offline for just 30 sec… i still got hit on my online score…
so even the automatic update would most likely give a slight hit in the online score of big storagenodes as their audits come in a steady stream.

online score is also new… so will be prone to require adjustments from storj labs before working as intended… as i think i mentioned before.

Morisey · October 21, 2021, 5:49pm

Came to look exactly for solution to same problem , im on rock solid optic , no downtimes whatsoever , still for past couple months i noticed that my online statuss gradually decreasing bit by bit and not getting back to normal 100% statuss like before. Previously even if i took hour of maintenance , in couple days i was back on 100% online statuss , its not the case anymore , online statuss no increasing , only decreasing even if im are online all the time. Im on static IP , no DDNS and such.

Alexey · October 21, 2021, 6:35pm

Hello @Morisey ,
Welcome to the forum!

Do you have an uptimerobot.com graph?
You can use these scripts to figure out when your node didn’t respond on audit requests:

Then you can check your logs on that time in the storagenode (look for “ping satellite failed”), logs of your firewall and logs from your router.

Morisey · October 21, 2021, 8:21pm

hmm , i do have quite few ping satellite failed error msgs in logs , tbh nothing is changed on node side , i did changed rooter to brand new asus , since my old netgear wasnt getting proper updates anymore , put in place same port forwarding rules , checked that ports are reachable from outside.
No idea why online checks failing , since there was no downtimes.

I set uptime robot , will see result