Downtimes climbing

BBQMan · September 3, 2021, 4:26pm

HI All,

I have a couple of nodes that downtimes are climbing but they aren’t down at all. Matter of fact, the data is flowing well from the sats that are claiming downtime. Any idears?

Alexey · September 3, 2021, 4:51pm

If you are talking about Uptime on the dashboard - it’s how long it was running (unrelated to online or downtime).
If you are talking about Last contact it’s updated when you refresh the page.
The only matter is “ONLINE” indicator and online score on the satellites.

If you mean the online score is dropping - then satellites cannot contact your node or your node cannot reach them to answer.

Please, check your firewall rules to do not block traffic from any source and any source port to the port and local IP of your PC with storagenode as a destination. You should not block the traffic from any port and your local IP as a source to any destination with any port and IP.

BBQMan · September 3, 2021, 4:55pm

My score is dropping. One would think if the node is sending and receiving data OK then the score wouldn’t drop. At least the logs indicate there is no problem of data flowing. I’ll double check the firewall and make sure.

Can I force the node to run only on TCP instead of UDP packets?

Alexey · September 3, 2021, 4:59pm

Yes, just remove port forwarding rule for UDP

Please, search in logs “ping satellite failed”

BrightSilence · September 3, 2021, 5:56pm

I had this in the early days. Turned out my router was dropping packets left and right. At the time the only thing that helped was resetting my router. If you try that don’t forget to set up port forwarding again. But I’m not sure it’s still relevant as the uptime system has changed significantly since then.

BBQMan · September 3, 2021, 6:41pm

Kinda wondering if this is going on as well. I have a Fortinet Fortigate 60F which is pretty good but there was a firmware update a couple of weeks ago which coincides a little bit with the problems starting. I’m going to take UDP out of the equation to see if it helps any.

Thanks again for the suggestions!

SGC · September 3, 2021, 6:44pm

if it is a router problem disabling logs should help.
often consumer routers can be overloaded by excessive logs if they aren’t designed for high traffic.

we also had someone that was blocking traffic from most of the world a while back… he ofc also got downtime for that because the sats couldn’t see his node.

but assuming there is no such blocking, then dropping packets is certainly an option.
only takes one bad / misbehaving device or cable to mess up much of your network.

you could also have a node that is rebooting randomly… lots of options really as usual
bad isp that drops your connection from time to time because they are bad at their job.

best approach is to monitor and try to identify the problem… is it getting worse or better…

keep in mind online score takes a full 30 days to recover… so downtime will haunt your online score for a long long time.
its not uncommon to be below 100%

stuberman · September 4, 2021, 3:26am

The 60F is a great commercial firewall. Of course, the more advanced features you turn on, the lower your maximum throughput. Storj certainly does not push a lot of data, but if you have lots of other applications using that firewall then…

BBQMan · September 4, 2021, 1:33pm

Ya its a good one and we are barely tapping its ability IMO. CPU utilization is only 2% and we’re only pushing about about 20mb up and 20mb down as a daily average.

Unfortunately I think something else must be going on. I posted stats on my 500gb node on a VPS in Germany a couple of weeks ago and its been really lucky pulling on the average $20 per month recently with an estimated $24 for September. I NEVER touch this node and it’s been online for 2 years or so. Now even it is dropping so I’m thinking something else may be going on with the network. Seems that all of this started over the last week to 10 days so I’m not sure if it could be related to 1.37.1 or not but something is causing the issue.

I turned off UDP on a couple that had some big problems to see if it helps. If this awesome little 500GB node keeps dropping something is definitely up with the network.

BBQMan · September 4, 2021, 1:37pm

I turned off UDP on the nodes with the worst problem to see if that helps. Now I’m having issues with my best node which isn’t even on my work network.

Alexey · September 4, 2021, 1:47pm

Too small details. Please, elaborate more on received errors.
At the moment I can only say that they have someone in common
Perhaps you use the same firewalls or security rules.

BBQMan · September 4, 2021, 1:56pm

LOL, I assumed you looked at my previous post. My very best performing small node is now seeing the online scores drop. This node isn’t on my network nor have I changed anything on the node in months. While the drop is tiny, it is still some concern to me. I’ll keep watching and report back here if things continue.

Alexey · September 4, 2021, 2:00pm

Could you please show graphs from uptimorobot.com for that node?

BBQMan · September 4, 2021, 2:11pm

Alexey · September 4, 2021, 2:36pm

And results of the script:

BBQMan · September 4, 2021, 2:44pm

https://stats.uptimerobot.com/Mo8VZH20rM

Password “Apples”

The node we are working with is Contabo SJ1

Alexey · September 4, 2021, 5:17pm

There are dates when your node was not available for the satellites. It could be not the network issue - the node just not answered on audit request.

SGC · September 4, 2021, 5:40pm

it’s a 500gb node, the smaller the node the more it tends to affect online score, you may simply have been unlucky to get hit by audits while the node was doing its update…

i wouldn’t worry about unless if it keep dropping, you can have like 12 days of downtime before it will hit 60% and get you a suspension…

just monitor it a bit, doubtful doing anything more is worth the time.
keeping 100% can be very difficult.

BBQMan · September 4, 2021, 6:49pm

Thanks to all as always! After losing a couple of nodes over the last 3 months I’ve been a little paranoid. Trying some multiple smaller nodes on the systems instead of the 15tb versions.