Online score is dropped and node is suspended

Kopcap · June 12, 2024, 12:52pm

You are very attentive and noticed a slight Online drop of less than 2%.
I only monitored the online status of my nodes and the status of docker containers and everything was fine. But yesterday I received messages that some of my nodes had been suspended for being offline.
In fact, all my nodes have been working non-stop all this time, receiving daily traffic. But all those who receive traffic through port forwarding from other IPs have lowered their online status.
The maximum Online decrease have nodes that share the same IP.

JWvdV · June 12, 2024, 2:56pm

What brand is your router?

Kopcap · June 12, 2024, 3:14pm

The most powerfull of Keenetic Build a PC for WI-FI router Keenetic Peak (KN-2710) with compatibility check and compare prices in Germany: Berlin, Munich, Dortmund on NerdPart

MarviBiene · June 12, 2024, 3:19pm

I had the same issue, but it was definitely my issue. Received Daily traffic and dropping online score. Monitor you nodes and check if you have connection issues. I myself have two monitorings running. First is uptime robot (every 5 minutes) and the other one is my own (every minute)

A can see on my own that the nodes has sometimes timeouts for like 30 or more seconds that causes the online score to drop

JWvdV · June 12, 2024, 3:38pm

Don’t know the router brand nor the specific router. But might there be a DDoS-protection into play, which might block the audits?

Kopcap · June 12, 2024, 4:10pm

2+ years without any problem until this f** test started

Kopcap · June 12, 2024, 4:25pm

I set up uptime robot for a few nodes.
Yes, sometimes it shows “your XXXX is down” and then “is up” in a minute or 2, but these warnings are not very frequent, and even immediately after the warning node dashboard is “online”. There is no any errors in the node log.
Well, and what can I do with these warnings?

JWvdV · June 12, 2024, 4:33pm

You can lament on this test, but this probably is the future of Storj. So if you wouldn’t have come into trouble now, you might have come into trouble some months or a year later. Please get just over it. And just greet this test as a heads-up that your config apparently isn’t up to par for the future needs of Storj.
The biggest difference with this test is the increase in bandwidth and therefore also the frequency of contacting with the same IP. So if there is an option to disable DDoS-protection, disable it. Furthermore, you should check logs whether there are errors in it. Also look into your router if it shows any errors and whether resources aren’t starved (had an ASUS router in which adaptive QoS took do many resources, that the router sometimes couldn’t keep up; so after disabling it, it worked perfect).

Kopcap · June 12, 2024, 7:57pm

I expect my nodes get less traffic if my bandwidth is not sufficient, and new node selection rules actually tune it.
But the current online audit need some tuning, too.
Nodes are in fact ONLINE all the time, and satellites do know they are online since there is some traffic to these nodes all the day.
Give that node less traffic, but don’t disqualify it!

JWvdV · June 12, 2024, 8:25pm

This strengthens the idea of DDoS-protection being the culprit: ingress is coming from many different IPs, but the audits only from some.

Since the problem isn’t being mentioned by many, is probably your system that needs tuning instead of the audit process.

So, again: does your router have errors in it’s log or DDoS-protection?

Please, stop lamenting about stuff you don’t know, not isn’t likely the culprit. If you think something might be won’t at Storj’s side, you might consider asking.

Alexey · June 13, 2024, 6:51am

You may check when your nodes were not available:

Kopcap · June 13, 2024, 8:08pm

I double-checked with support: there is no DDoS protection in the router.
I checked the logs: no errors were found.
Let me explain my concern.
Some time ago, I studied the minimum requirements, deployed an infrastructure that meets them, and grew nodes of a significant volume.
Now, the requirements for those nodes that will serve new customers are changing: in particular, wider internet channels and more professional routers are required.
Perhaps I will upgrade and be able to meet the new requirements over time. Until then, I am ready to give up on node growth and limit myself to minimal traffic from new customers.
But why am I being deprived of the ability to continue storing old data and receiving compensation for it due to unexpected increased requirements?
And now, because of the “new reality,” my nodes not only get deprioritized when choosing nodes for new traffic, but also have a risk of being disqualified due to this excessive traffic!

Mitsos · June 13, 2024, 8:16pm

Are you running multiple IPs through the same router?

Also define significant volume.

ACarneiro · June 13, 2024, 8:56pm

Errrr… you’re not. The old data you have is data that you have. It’s not going to be deleted unless customers delete (just as it used to be).
If you limit your bandwidth you’ll still have node ingress. You can limit it to suit your needs. Nothing changes.
I’m not quite sure if I missed your point somehow…

Kopcap · June 13, 2024, 9:31pm

The problem is that my nodes are at risk of being disqualified because they fail online status audits - even though they are continuously online and receiving daily traffic. This issue particularly affects nodes that share the same IP address. Online score is dropped and node is suspended

ACarneiro · June 13, 2024, 9:35pm

Yes, but maybe that’s because your connection is so saturated that audits are failing.

If you reduce the number of concurrent requests you may improve the situation.

You could try adding

storage2.max-concurrent-requests: 5

to your config.yaml and restart the node and see if that helps. You can tweak it up and down as required (but you must restart the node every time you change it).

Also, if you haven’t already done so, I would suggest enabling the lazy filewalker.

Mitsos · June 13, 2024, 9:37pm

Can you give us a bit more information on your setup so we can offer our help? I see nodes on native connections and nodes through port forwarding, in your posts. Are all of these running on the same host?

Kopcap · June 13, 2024, 10:06pm

Nodes on native connections (direct ISP) feel good.
Nodes that get traffic via port forwarding from VPS affected.
I assume that the limited VPS bandwidth during traffic overage hinders the passage of audits.
Currently, I have switched all nodes to direct ISP connection. Once the Online score recovers, I will try to limit the incoming traffic to the nodes by adjusting the node’s settings and switch them back to VPS.
What other methods, besides reducing max-concurrent-requests, can you suggest for limiting traffic to the nodes?

Mitsos · June 13, 2024, 10:10pm

First of all I would suggest next time you “study the requirements”, you include the ToS in that study as well.

You can try using some sort of QoS that prioritizes outgoing packets so the online requests get replied to.

Kopcap · June 13, 2024, 10:20pm

I think the outgoing traffic is not that large, and the replies to the requests will pass without additional prioritization.
The problem, apparently, is that these requests are not reaching the node.