Online status fluctuates since a few days

LinuxNet · November 5, 2020, 1:39pm

Hello everybody. I have been observing for 3-4 days that my online status is fluctuating a lot. However, only with the nodes that run with a dynamic IP. Sometimes the status is lower, sometimes higher. Strange.

Can this be due to the forced reconnect? Two of the three nodes with the problem were off for a few hours due to rsync last week.

I don’t notice any abnormalities in the log.

baker · November 5, 2020, 1:51pm

For the nodes with a dynamic IP, I am assuming you are using a DDNS service? I would guess your DDNS updater isn’t updating the IP quickly enough. I would start there.

LinuxNet · November 5, 2020, 2:02pm

Yes, of course I use DDNS. That was also my first thought, but when I look in Freshping, Uptimerobot or Hetrixtools then my nodes are available 24/7.

Since the beginning I have noticed that of course there is less traffic after the reconnect. But the nodes are never offline and can be reached immediately after the reconnect.

Only Zabbix says that my nodes cannot be reached between 45 and 60 minutes a day. But I think that’s because I also run Zabbix with a dynamic IP because the time after the reconnect, zabbix cannot reach any of my servers for some time.

The strange thing is that it’s only been for a few days, but the nodes have been running for almost a year and that has never been before. So I don’t quite believe that the reconnect is responsible for it.

baker · November 5, 2020, 2:14pm

A single missed ping from a satellite would cause your online time to be reduced by 0.25%. So it is likely a very small time lag. The free account for uptime robot only checks every 5 mins. Also, the uptime/online scores were only recently enabled and shown, so this could have been happening this entire time and you wouldn’t have seen a difference.

SGC · November 5, 2020, 5:18pm

there are only a little less than 200 hours in a week… so 0.05% or 0.25% is like an hour or so over… tho it does take 4 hours for a node to be registered as offline currently… the count doesn’t start before that…

so any minor disconnects shouldn’t affect your online scores, so i would say it’s almost 100% sure that it has been due to the rsync downtime.

LinuxNet · November 5, 2020, 5:57pm

I now had a little more time and looked again. I believe that the online time for Node 2 and 3 has decreased because of rsync. Node 1 was also offline for 2 hours last week because I moved my server to a different location at home. Node 3 was offline the longest and is also the one that has fallen the most with 99.75%.

I’ll keep watching it for the next few days. Node 1 fell again very minimally, the other two are unchanged.

Yes I know. But Freshping and Hetrixtools every minute. I use all of them

BrightSilence · November 5, 2020, 10:09pm

You made the same claim elsewhere, but it’s not true. The uptime system works based on audits now. If your node is offline during an audit it gets counted. Even if it just went offline a second ago. It’s a little more complicated than this, but there is no 4 hour delay for uptime.

The uptime wasn’t counted before, I believe it was activated in v1.15. The downtime you’re seeing may have preceeded that update though, the score reports about the past 30 days, so it’s likely the downtime you had earlier.

kevink · November 5, 2020, 10:20pm

There is a 4 hours delay until the satellite counts your node as offline for uploads and downloads. But that has nothing to do with the uptime tracking. The online score is tested by audits. Those are independent systems.

BrightSilence · November 5, 2020, 10:24pm

I cut the quote off too early. @SGC specifically said the count wouldn’t start before then. I expanded the quote a little to clarify. But you’re right that the satellites keep trying to send and retrieve data from offline nodes for 4 hours, it’s just not relevant for the uptime tracking system.

SGC · November 6, 2020, 9:26am

@BrightSilence it’s a complex system takes time to be fully familiar with all the details…
my bad, i’m sure you have it right…

how about we get back to suggestions about what may be the issue with @LinuxNet 's storagenodes
rather than dwelling on my inaccuracies of understanding the yarn pile of details that weave the tardigrade backbone.

@LinuxNet

maybe there is some sort of issue with the uptime tracking audit kinda system… that got merged… another way might be to switch around the connection… move the ddns internet connection to a new node and see if the problem goes with the internet connection or stays with the node…

then atleast you will have cut down the places the problem can be… it’s it pretty telling that it seems to be related to the ddns, which could be that something has changed somewhere… or maybe the internet connection is unstable… it can be difficult to track minute disconnects.

unless if there is a lot of ping pong going back and forth over the connection, which maybe why the storage node spots it and not everything else…

you can run a ping of something online… see if it works 100% also… most large scale sites doesn’t really care… i think… haven’t really checked the legal of it…
but i doubt they would care if you ran a ping of their servers for half a day or two…

like it was said earlier… takes a while to discover a disconnect, because it requires a check or a live connection / trigger type stream / connection to be broken.

stuff like networking gear can also go bad, so that a switch doesn’t really switch any more… not 100% anyways… more like 99% maybe… which can lead to connection instability.
it can basically be anywhere on the way along your route to the nearest internet nexus a piece of gear is bad… and it just takes one little switching circuit or whatever it is today to be worn out.

so starting to exclude options from what is wrong is my preferred approach to problem solving, if you can exclude 50% of what could be wrong… and then 50% of that when it’s only 25% of options left and you will have 4 times the chance of fixing it randomly poking around

even if getting killed or destroying stuff may also sometimes go up…hehe a lot

BrightSilence · November 6, 2020, 11:38am

I have to admit that’s funny. You telling me to get back on topic. (Sidenote, I did actually respond to @LinuxNet with the most probable explanation)

Inaccuracies send people who are looking for an answer in the wrong direction. It’s completely on topic to address those. In fact it’s important in helping out the original poster.

Speaking of which

The uptime system got activated in a recent update and now shows the downtime that happened in the past. What’s the issue?

Where’s the evidence for this? It almost certainly is not, since 3 different uptime tracking tools never saw it be offline, the short times in which the node is unreachable almost certainly haven’t been counted. Furthermore, the downtime tracking system only counts downtime if your node was down for at least 2 consecutive audits, which are unlikely to happen in such a short period of time. This negates most of your other suggestions as well.

So… yeah, corrections prevent people from being sent on a wild goose chase by the assumption that something is wrong in the first place. If that’s off topic, then I guess this post is too. But I’m sure @LinuxNet is probably still happy to read that there is most likely nothing wrong and his node is simply reporting the downtime that actually happened a while ago.

LinuxNet · November 6, 2020, 12:36pm

I think I can rule out a problem with my internet because two nodes at home with their own IP and node 3 also have their own IP. All three nodes that are affected were offline for different times. All other nodes that were never offline do not have the “problem”.

It is fitting that the node that has been offline the longest has fallen the most.
Node 1 fell again slightly (but was also offline later than the others), the status of the other two remained the same.

When does the status rise again? After a week or after 30 days? I haven’t fully understood the tracking system yet.
Shame on me…

Can I read somewhere more precisely how the tracking system works now?

SGC · November 6, 2020, 6:55pm

no clue, people always seem to ready to beat people over the head for not having read the full code of the software, i think you can find what they call blue prints in the engineering section of the forum, where they will outline the basic concepts, but it’s usually pretty difficult to find exactly what one is looking for…

atleast in my experience, but i suppose that’s to expected if most people coming from the outside with no inside knowledge of opensource or programming communities, if there is an easy way to navigate this i haven’t found it yet…

not sure if this is still relevant, since it’s only a draft… it was what i found when trying to search

you get the idea, then it’s just about finding the right one on the forum and hope it’s still relevant

BrightSilence · November 7, 2020, 10:22pm

There is indeed the blueprint that @SGC linked, which is probably the most complete explanation.
There was some discussion about this solution with suggested changes that seemed to be considered, so I’m not entirely sure it’s up to date on the details. Also have a look at the change logs as they contained some info on it as well.

I feel like that’s in large part directed at me and fair enough, I have corrected people a few times. I don’t mean to beat you over the head, so I apologise for that. What I’m trying to do is make sure people get the best information. Sometimes that involves correcting wrong information given by others. I don’t mean it personal.

Btw, I would never expect people to read the code. I only do that in rare occasions myself. In this case the information is available in blueprints and change logs.

Pac · November 7, 2020, 11:11pm

(I know I’m off topic but… “Perfumed approach”? xD Is that a real saying in english? )

SGC · November 7, 2020, 11:15pm

you are certainly part of it, but it’s far from a singular thing… it’s almost becoming a cultural thing now a days… everything is “easy” to look up if one knows the name of it, and no matter what is said and no matter how deep the understanding people have, many are so quick to dismiss stuff after reading a couple of minutes on wiki.

even if people maybe have a decade long experience directly dealing with it for their every day work… not that that’s really the exact same thing here…
what you do is for the integrity of the forum, and maybe you are right more often than you are wrong… seems like it…

and to be fair to you i don’t really want to spread misinformation either… even tho it happens at a maybe to large a frequency, and i will certainly try to improve, and i don’t really blame you, even if i sometimes end up feeling like punching my monitor.

learning new stuff is hard, and this seems especially so since the information is very fluid in many cases…

it’s not you, it’s me… or maybe it’s me and the whole open source programming world that doesn’t really agree with my frame of mind, being more use to dealing with laws of nature which … i am starting to appreciate so much more for being immutable in almost all cases…

even if their details will often get expanded, then the laws can never really be wrong and they will never change, even if we learn more details which gives us a deeper understanding and the ability to find loop holes, then the fundamental law is usually never broken…

so most likely just me, slowly realizing that what i learn from this i cannot use in a decade, even tho i find the whole cloud storage thing very interesting and will want to continue running a node… maybe the software aspect of it really isn’t me.

ofc it might help if i actually bothered trying to learn… but not sure i have the heart for mapping rain drops in a rain storm or making sand sculptures of understanding… xD

so that doesn’t help and would ofc make me more wrong than i care to be… ofc might help if i just did shut up a bit lol

but yeah isn’t not directed at you…alas more me i think and i really cannot blame you or take it personal…

i may be relearning forgotten lessons, which is always a good thing…

no hard feelings

SGC · November 7, 2020, 11:17pm

its below preferred in the spellcheck
does it show that i doesn’t proof read xD

BrightSilence · November 7, 2020, 11:50pm

Awesome! I’ll try to be a little nicer about it to prevent people wanting to punch their monitor though, haha.

For what it’s worth, I know you are actually learning a lot. But yes, it is a fast moving world. The specific facts you learn know will undoubtedly be outdated in a decade, but the methods and ways of thinking about these kinds of solutions will still carry over. I can assure you that thinking about these problems will create brain paths that will accelerate problem solving for future use cases as well.

If can can leave one last tip, just mention that you’re not sure if you aren’t. It still puts the info you’re sharing out there, but with a warning label that tells people to double check. I know you’re also trying to help people so your contributions are absolutely a net positive.

Alright, I think that’s enough one on one on a public forum for now. Though, luckily the subject seems to have been mostly concluded, so I’m sure we’ll be forgiven for the slight off-topicness here.
I think a good community acknowledges that we’re all human and stating some of these more human aspects out loud only fosters better conversation.

LinuxNet · November 8, 2020, 1:42pm

Everything is fine, no problem

I can confirm the solution again today after I had my third node off again for an hour: It was the offline time. Today after the lesson I could see that the online value has dropped.

Thanks everyone for the help and information

LinuxNet · November 14, 2020, 11:21am

Unfortunately, my online status keeps falling and the nodes weren’t offline for a few days. I can see this with several in the forum. I don’t seem to have the problem alone …

It’s only minimal every day. But why does it keep falling?
Can that have to do with the daily reconnect?