Not so... QUIC... Correctly "Misconfigured"

Hi. Can someone explain this?


Port 28967 open in Router and PC, and checked on multiple tools.

This is sometimes failed on the router’s side. They not always good for UDP traffic. You may check this - try to reboot your router, perhaps it could solve an issue (for a while).

Already did that. All my other services are ok.

Then you may try to hard refresh the dashboard to reset a browser cache, the node re-check the QUIC status every check-in on the satellite (every hour by default).
Do you have several network interfaces on the host? If so, try to bind the node only to the one interface (local IP) which you have used in the port forwarding rule.

I’ve done that also. Hard refresh the dashboard app, and even tried opening the dashboard in several browsers. The host is a dedicated pc, with just one interface and one IP, that only runs Storj.
Either way, reading through the forum, I see QUIC is nothing to worry much. Since I haven’t made any changes and Pingdom is giving me an ok result, maybe a Dashboard glitch. I’ll just wait for next update.

I can successfully connect to your address (as shown in the picture) with QUIC. I think you’re right, this may be some kind of dashboard bug.

You might try to find the message “Your node is still considered to be online but encountered an error” in your log. If a satellite is claiming it can’t connect to your node via QUIC, then it will give you a message indicating why.

3 Likes

My log doesn’t show that message. The only thing I find that could be related to this is:
2023-10-23T15:39:59+01:00 ERROR contact:service ping satellite failed {“Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “attempts”: 11, “error”: “ping satellite: check-in network: failed to resolve IP from address: hotnet.freeddns.com:28967, err: lookup hotnet.freeddns.com on 10.103.0.10:53: read udp 10.101.6.2:35149->10.103.0.10:53: i/o timeout”, “errorVerbose”: “ping satellite: check-in network: failed to resolve IP from address: hotnet.freeddns.com:28967, err: lookup hotnet.freeddns.com on 10.103.0.10:53: read udp 10.101.6.2:35149->10.103.0.10:53: i/o timeout\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:203\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:157\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

When I try to access my link “hotnet.freeddns.com:28967”, I get this answer:
{
“Statuses”: null,
“Help”: “To access Storagenode services, please use DRPC protocol!”,
“AllHealthy”: true
}

Here you go. Switch to a better/less flaky DDNS.

2 Likes

I would… if my router wasn’t so restrictive, in terms of what DDNS I can use.
EIther way, my domain is set like this:

…and it works fine with all my other services…

Arogantrabbit is correct, we have seen this issue many times and it is due to the DNS servers you are using. Not the DDNS but the DNS server you are using is having difficulty resolving the domains.

You right, but it could be either or both. TTL of DDNS records is very short so the dns server has to keep querying the DDNS provider. And if that provider fails to respond — the dns provider does not really have other choice than to return failure as well. And this is for every DNS resolver storj customers may use. Rinse, repeat every minute.

Flaky ddns providers may be unable to handle all that traffic and start failing requests.

It does not have to be on the router. Run e.g. inadyn on the same host you run storagenode.

My personal recommendation— use Cloudflare (inadyn supports it)

Yes, this is true. Either way, if the router is at issue, consider bypassing it and host the DDNS tool on the server itself so you can utilize a more robust DDNS provider, if that turns out to be the bottleneck. That is one option, anyway. It gets more drastic from there, such as changing routers for instance.

Your DNS or DDNS provider may be flaky here, it’s true, but I don’t think that is what’s causing your QUIC status to be “misconfigured”.

This indicates the satellite couldn’t contact your node at all, and this is not specific to QUIC. Your node would show as “offline” until it got a successful pingback. So I don’t think this is your problem.

If you don’t see the log line I mentioned, then I’m still pretty sure there is a dashboard bug.

1 Like

I get that, but only one satellite has that problem:
2023-10-23T22:53:47+01:00 ERROR contact:service ping satellite failed {“Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “attempts”: 10, “error”: “ping satellite: check-in network: failed to resolve IP from address: hotnet.freeddns.com:28967, err: lookup hotnet.freeddns.com on 10.103.0.10:53: read udp 10.101.1.3:46183->10.103.0.10:53: i/o timeout”, “errorVerbose”: “ping satellite: check-in network: failed to resolve IP from address: hotnet.freeddns.com:28967, err: lookup hotnet.freeddns.com on 10.103.0.10:53: read udp 10.101.1.3:46183->10.103.0.10:53: i/o timeout\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:203\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:157\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

That results in:
2023-10-23T22:46:49+01:00 INFO reputation:service node scores updated {“Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, "Total Audits": 3, “Successful Audits”: 1, “Audit Score”: 1, “Online Score”: 1, “Suspension Score”: 1, “Audit Score Delta”: 0, “Online Score Delta”: 0, “Suspension Score Delta”: 0}

All other satellites are connecting fine.
2023-10-23T22:46:50+01:00 INFO reputation:service node scores updated {“Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Total Audits”: 39, “Successful Audits”: 39, “Audit Score”: 1, “Online Score”: 1, “Suspension Score”: 1, “Audit Score Delta”: 0, “Online Score Delta”: 0, “Suspension Score Delta”: 0}

My suspicion is some error on the SATELLITE, not server, DNS, DDNS, or others…

Yes, this is talking about an error on the satellite, as you suggest, but the error is that it failed to look up hotnet.freeddns.com over DNS. The satellite does not typically experience many timeouts when looking up node addresses, so it would seem like the chief problem in this case is the nameservers at changeip.com. Those apparently handle DNS requests for the freeddns.com service.

Do you see that error message continually, or just once? I would expect the dashboard to show your node as entirely offline if the most recent pingback resulted in that error.

Can’t understand why. Just did a DNS Lookup (MXToolbox, WhayIsMyIp, and others) and my domain shows no problems. And, again, souldn’t all satellites have the same problem if it was a thing on my side?
Only Saltlake “can’t connect”, but my score is still 100%

Well it is timing out waiting for a reply, so it could be that the distance coupled with the performance of the DNS servers has too much latency to respond quick enough. It may return the IP lookup, it just doesn’t do it fast enough.

Ok, and why is Saltlake different from other satellites?

Again, it may be that its location between itself and the DNS servers is longer than it is for other satellites. It could also be some kind of rate limit on the DNS servers themselves and it is tarpitting Salt Lake. What we know from experience from others that have had this issue is that by switching DDNS providers it resolved the issue.

There are around 20k nodes, if Saltlake was having this issue for everyone, this forum would be filled with threads about it. What you have is something likely isolated to the DNS servers your DDNS provider is using. Can you switch DDNS providers?

I’ll try. Just need to find one that I can be queried with my router mandatory specifications…
I’ll post results late on. Thanks