Suspension although 100% online

why is my node suspended although it has 100% online score?

Hi @champmine18,
Please post a screenshot of the dashboard showing the node is suspended along with the audit, suspension and online scores. Also any recent ‘ERROR’, ‘FATAL’ and ‘WARN’ log entries.

1 Like

2022-01-18T11:57:21.410Z WARN contact:service Your node is still considered to be online but encountered an error. {“Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Error”: “cont
act: failed to dial storage node (ID: NODEID) at address DOMAINNAME:28967 using QUIC: rpc: quic: timeout: no recent network activity”}

ok, found it, it was a DDNS problem with lacking communication to the node from outside, thanks for the pointer! :slight_smile:

most likely you need to setup UDP and open UDP ports because else QUIC won’t work.
but lack of udp/quic shouldn’t get you suspended.

so that error is unrelated i think…

I run several test networks across my LAN. I always disable UDP/QUIC whenever possible. If I don’t my whole LAN gets flooded with packets and slows to a crawl. So, if Storj makes UDP/QUIC a non-optional uptime check, I’ll probably need to exit the network.

So with UDP/QUIC activated the nodes get much more traffic?

No.

UDP just sends packets… it doesn’t care if the receiver actual gets them. So, what happens is every application with UDP enabled gets a huge amount of packets… which may or may not reach the endpoint in the correct order or time.

This is actually a method of DDoS as well…

https://www.radware.com/security/ddos-knowledge-center/ddospedia/udp-flood/

There’s also a problem with UDP and my ISP… my ISP has rate limited my connection at various times when I’ve tried to enable UDP on some applications in tests.

In short, UDP is a fairly terrible idea for consumer level LANs connecting to consumer level ISPs.

EDIT:

Cloudflare has a better summary of a UDP Flood Attack:

https://www.cloudflare.com/learning/ddos/udp-flood-ddos-attack/

A UDP flood can be thought of in the context of a hotel receptionist routing calls. First, the receptionist receives a phone call where the caller asks to be connected to a specific room. The receptionist then needs to look through the list of all rooms to make sure that the guest is available in the room and willing to take the call. Once the receptionist realizes that the guest is not taking any calls, they have to pick the phone back up and tell the caller that the guest will not be taking the call. If suddenly all the phone lines light up simultaneously with similar requests then they will quickly become overwhelmed.

As each new UDP packet is received by the server, it goes through steps in order to process the request, utilizing server resources in the process. When UDP packets are transmitted, each packet will include the IP address of the source device. During this type of DDoS attack, an attacker will generally not use their own real IP address, but will instead spoof the source IP address of the UDP packets, impeding the attacker’s true location from being exposed and potentially saturated with the response packets from the targeted server.

As a result of the targeted server utilizing resources to check and then respond to each received UDP packet, the target’s resources can become quickly exhausted when a large flood of UDP packets are received, resulting in denial-of-service to normal traffic.

1 Like

This is maybe a misunderstanding or a partial truth.

You’re right that UDP by itself (just like IP packets) doesn’t provide session orientation or awareness of receiver-speed, but that is okay, because QUIC adds those things.

So for context, TCP is built on top of IP packets. IP packets are like post cards - they get sent and no one cares if they arrive or are responded to. TCP adds acknowledgements, packet ordering, port numbers, and so on. With TCP, if the receiver starts going slower, the sender slows down, and vice versa. Unfortunately, TCP bakes some pretty rigid settings into how congestion control is decided into your operating system kernel, so if you wanted to tune it to use less resources on routers when you have hundreds of parallel streams, it’s pretty hard to do.

Similar to TCP, QUIC is built on top of UDP packets, which are just IP packets with a port number. QUIC adds acknowledgements, packet ordering, port numbers, and so on. QUIC slows down if the receiver slows down. Crucially, QUIC allows us to improve the congestion controller to be more aware of the parallelism inherent in Storj upload/download streams.

When we want UDP enabled for Storj, it’s because we want to use QUIC, which gives us better control over congestion control than TCP does. Using QUIC we will be able to protect your router better, with more complete knowledge of the nature of our distributed parallel network streams.

It’s always possible for someone to get DoSed with IP packets, which have all of the same problems you ascribe to UDP. Enabling or disabling UDP doesn’t change the fact that anyone can send you packets. It’s what you or your router do with the packets that matters.

It is precisely for the reasons you evidently disable UDP that we are moving towards QUIC on top of UDP. We want to be better network citizens and right now hundreds of TCP connections with dueling congestion control can stress routers out pretty bad.

7 Likes

What I know from my own experience running several IPFS nodes with QUIC on my LAN is the following:

  • My LAN slows to a crawl
  • My ISP rate limits me
  • My routers scream in pain

Disabling QUIC fixes all of that.

If Storj moves to 100% QUIC, I will have no choice but to exit the network.

I would love to know more about that. I of course believe you but you might be applying a more heavy handed intervention than is strictly necessary. Certainly when we roll out QUIC (which we are working on doing, but carefully), we are doing so to avoid these issues, not cause them more.

Would you be willing to spend some time collecting more data about when and why this happens to you?

… you will be minus one SNO.

Sorry to say, but I will not run QUIC on my LAN.

cause and effect if often not directly correlated, a symptom is most often caused by a plentitude of causes, which combined will have an effect…

i also don’t doubt that this is happening, and i must admit my understanding of exactly how or why it would happen isn’t my specialty, then UDP is a fundamental part of the IP protocol… i doubt it should cause serious issues.

but some things stuff just runs off the rails, and it’s never fun to deal with the aftermath.
i am often easily fooled into thinking one thing have caused something on my system or network, then only to realize after much investigation, that it was some odd niche case configuration issue or the likes.

my system is a bit on the crazy side, but i haven’t seen any issues at all when enabling UDP.
infact i tried to measure if there was a bandwidth difference just for fun… but didn’t see one…
my network is also rather complex, lots of hops and back and forth’s + a good number of nodes… and i only run on a regular 1Gbit Ethernet using TP cables.
i cannot imagine that if there was a problem with running UDP that i wouldn’t experience it… but ofc the gods of random could easily see to that…
however just saying i would be a prime case for that, and i don’t see it…
i do have some nic buffer overflow issues i need to get solved… but haven’t gotten around it it… seems like my NICS got a 4K buffer but only runs 512B default…
tried to fix it but no luck yet… so i drop like 1-2 % of packets … but the storagenodes doesn’t care… its a digital data stream after all… a few packets dropped doesn’t make all that much of a difference.

you should really let jtolio try to figure out what is wrong with your network, for your own and the future of storj’s sake, solving these problems early on is the key to a successful network and happy customers, which benefits us all.

and maybe in the process you will discover what actually is causing issues on your network.
because i seriously doubt a bit of a workload should make your network stall.

2 Likes

There’s nothing wrong with my network.

QUIC is UDP traffic, a packet is not just a packet. UDP vs. TCP matters a lot.

This traffic tends to be marked as a DDoS attack vector by ISPs… thus the rate limiting when I enable it on my IPFS nodes.

And the traffic itself tends to mangle my LAN.

I’m simply indicating my own experiences. When I had problems with IPFS, I went looking for answers as to why QUIC may be problematic. It’s extremely difficult to find answers… since most people either understand the theory, but have little practical experience running things or are running things on Enterprise hardware and Commercial Internet access.

My personal experience is that running several services with QUIC enabled results in significant problems.

I’m not interested in trying to figure out why… I already invested weeks trouble shooting my IPFS nodes. Turning off QUIC fixed everything. So, that’s where I’m at.

For some information… Here’s a randomly found 3 year old article on QUIC, Security, and other network considerations:

https://www.fastvue.co/fastvue/blog/googles-quic-protocols-security-and-reporting-implications/

I would seriously advise Storj not to make QUIC 100% mandatory. Unless they want to deal with the smaller SNOs simply exiting the network when their consumer ISPs send them a letter about running a “compromised” server… or something like that… and, no, I didn’t get any such letter… but I did get rate limited.


EDIT:

I found an excellent write up on QUIC by Fastly:

https://www.fastly.com/blog/measuring-quic-vs-tcp-computational-efficiency

Their conclusion indicates that a best case QUIC implementation could achieve a 1% increase in throughput over TCP with TLS 1.3 …

I’m not quite sure why there seems to be a push to use QUIC on many projects. My own experience is that UDP packets flood my network and cause problems. The Fastly write up shows that there’s not much of an increase in speed under the best case scenarios…

If Storj decides to switch from TCP to QUIC, I’ll be sad to leave the project… but will no longer to be able to run a node.

2 Likes