Changes to the UDP buffer size for QUIC needed!

Krawi · July 14, 2025, 3:48pm

Hello

Today i noticed a new INFO line at node startup in the logs.

2025-07-14T14:30:37Z    INFO    failed to sufficiently increase receive buffer size
 (was: 208 kiB, wanted: 7168 kiB, got: 4096 kiB).
 See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.

GitHub

Following the text on GitHub i changed the size of the buffer(s) and restarted the node, the INFO line was gone.

The change seems to work.

nerdatwork · July 14, 2025, 3:54pm

IIRC it has been there for more than a year now I am glad you got it working.

littleskunk · July 14, 2025, 4:38pm

There is also one line for TCP fastopen.

arrogantrabbit · July 14, 2025, 5:52pm

Well it’s nice to see that people read documentation! Kudos to @Krawi

Which is arguably much more impactful.
Nobody uses QUIC, nobody cares about QUIC, QUIC is dead.

But everyone [attempts to] use[s] TFO, and if your node does not respond to it – well, you lose more races.

I’m wondering, is it possible to disable support for QUIC in the storagenode? Is it advisable? is it possible? I’ll go read documentation.

Better yet – perhaps storj can consider switching to a different QUIC go library? The one currently in use is evidently pretty unstable in practice. Maybe then usage will increase?

snorkel · July 14, 2025, 7:05pm

Maybe quit using third party libraries for an unused function, that can be used in supply chain attacks, or however it is called…

arrogantrabbit · July 14, 2025, 8:25pm

I agree that shit that is not used shall be complied out as much as possible. But I’m still not convinced that the problem is with quick library and not in how storj is using and/or configuring it.

On paper QUIC shall be more performant and stable compared to TFO, it has useful features like 0-RTT connections (TFO just reduces rountrip by 1 for subsequent connections), supports multi-streams, and benefits from testing due to wide adoption.

quic-go is the most popular and therefore stable one. There are other implementations but they are more low level. I think there is even one from Microsoft.

That said, it’s totally possible the issues can be outside QUIC and be not solvable, especially with scenarios where the server is behind nat on a device with multiple network interfaces: that’s why binding to specific server address and keeping port consistent across DNATs solves the QUICK Misconfigured in vast majority of cases.

Relevant:

github.com/quic-go/quic-go

Incorrect source ip address in outgoing UDP datagrams

opened 11:54AM - 11 Jan 19 UTC

closed 02:05AM - 16 Mar 21 UTC

ironsteel

bug

Hello guys, I've stumbled upon an issue while implementing a test server that li…stens on all public interfaces of a host. When the server is listening on all network interfaces e.g. `0.0.0.0` (assuming server host has more than one public interface) outgoing datagrams are delivered with the wrong source ip address since UDP is connectionless. The linux kernel will try to guess the correct source ip for the outgoing datagrams (based on heuristics) but that is not reliable. As described in [this](https://blog.powerdns.com/2012/10/08/on-binding-datagram-udp-sockets-to-the-any-addresses/) blog post from the PowerDNS guys, the solution is to use `IP_PKTINFO` on Linux (other unixes also have this feature but with a different API) to read the dest ip from the control message of the incoming datagram with `recvmsg()` and use `sendmsg()` to set the source ip of the outgoing datagram. I've also looked at the source code of the QUIC implementation in chromium and it seems that they are already doing this but I'm not 100% sure (more like 80%) if it's used for the same purpose. Chromium code reference: https://cs.chromium.org/chromium/src/net/third_party/quic/platform/impl/quic_socket_utils.cc?type=cs&sq=package:chromium&g=0&l=100 I've got a working POC with `quic-go` based on the `gquic` branch, here is the commit : https://github.com/ironsteel/quic-go/commit/582e8a1bcefbf6de3e79f6823252f756b6eabb19 So question number 1: Would you guys accept it if I manage to do it in a more portable and sane way? and question number 2: Right now the POC is using the concrete `net.UDPConn` struct (with ugly type asserts) but I was thinking of a different implementation of the `connection` interface in the the `go-quic` code and possibly an extension to the `connection` interface. Also I saw that the code does not always use the `Read()` and `Write()` methods of the connection but sometimes uses the wrapped `net.PacketConn` directly to `Read()` and `Write()` UDP datagrams on the wire. So what are your thoughts on this?

Toyoo · July 15, 2025, 12:37am

Some time ago I stumbled on this paper, suggesting that there are general quality of implementation issues coming from the network stack itself that affect QUIC. As in, there was so much effort spent on optimizing regular TCP that a good implementation of a TCP-based protocol beats a standard implementation of a UDP-based protocol.

Alexey · July 15, 2025, 3:35am

We have it in the documentation:

Krawi · July 15, 2025, 3:55am

Yes, it’s in the docs, but look at the suggested number for the buffer size.

sysctl -w net.core.rmem_max=2500000

That are ~2.5MB and the current version of the node requests now 7MB!

The post on GitHub suggests ~7.5MB and that works fine.

Doesn’t the Storj Docs needs an update to reflect this?

Alexey · July 15, 2025, 5:10am

Maybe. Do you have many of such messages?
I found only two on my biggest node.

github.com/storj/docs

increased UDP buffer for QUIC to 7.5MB

main ← increase-udp-buffer-to-7-5-mb

opened 05:16AM - 15 Jul 25 UTC

AlexeyALeonov

+7 -7

https://forum.storj.io/t/changes-to-the-udp-buffer-size-for-quic-needed/30362/9?…u=alexey I can confirm on my node ``` $ grep "increase receive buffer size" /mnt/x/storagenode2/storagenode.log | tail 2025-07-02T16:15:34Z INFO failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details. {"Process": "storagenode"} 2025-07-09T01:40:46Z INFO failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details. {"Process": "storagenode"} ```

Krawi · July 15, 2025, 5:55am

Not many, they appear only at node startup, like here (Line #17):

2025-07-01T12:12:40Z	INFO	Configuration loaded	{"Process": "storagenode", "Location": "/app/config/config.yaml"}
2025-07-01T12:12:40Z	INFO	Anonymized tracing enabled	{"Process": "storagenode"}
2025-07-01T12:12:40Z	INFO	Operator email	{"Process": "storagenode", "Address": "<eMail>"}
2025-07-01T12:12:40Z	INFO	Operator wallet	{"Process": "storagenode", "Address": "<Wallet>"}
2025-07-01T12:12:45Z	INFO	server	existing kernel support for server-side tcp fast open detected	{"Process": "storagenode"}
2025-07-01T12:12:48Z	INFO	hashstore	hashstore opened successfully	{"Process": "storagenode", "satellite": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "open_time": "2.616363368s"}
2025-07-01T12:12:53Z	INFO	hashstore	hashstore opened successfully	{"Process": "storagenode", "satellite": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "open_time": "5.060323016s"}
2025-07-01T12:12:56Z	INFO	hashstore	hashstore opened successfully	{"Process": "storagenode", "satellite": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "open_time": "3.354724116s"}
2025-07-01T12:12:57Z	INFO	Telemetry enabled	{"Process": "storagenode", "instance ID": "1DghwxQbwJHrCkWSqnKThZFs2VSCEUjL6KGFhAD9SyeXagXJeL"}
2025-07-01T12:12:57Z	INFO	Event collection enabled	{"Process": "storagenode", "instance ID": "1DghwxQbwJHrCkWSqnKThZFs2VSCEUjL6KGFhAD9SyeXagXJeL"}
2025-07-01T12:12:57Z	INFO	db.migration	Database Version	{"Process": "storagenode", "version": 62}
2025-07-01T12:12:57Z	INFO	preflight:localtime	start checking local system clock with trusted satellites' system clock.	{"Process": "storagenode"}
2025-07-01T12:12:58Z	INFO	preflight:localtime	local system clock is in sync with trusted satellites' system clock.	{"Process": "storagenode"}
2025-07-01T12:12:58Z	INFO	Node 1DghwxQbwJHrCkWSqnKThZFs2VSCEUjL6KGFhAD9SyeXagXJeL started	{"Process": "storagenode"}
2025-07-01T12:12:58Z	INFO	Public server started on [::]:28967	{"Process": "storagenode"}
2025-07-01T12:12:58Z	INFO	Private server started on 127.0.0.1:7778	{"Process": "storagenode"}
2025-07-01T12:12:58Z	INFO	failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 4096 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.	{"Process": "storagenode"}
2025-07-01T12:12:58Z	INFO	trust	Scheduling next refresh	{"Process": "storagenode", "after": "5h36m5.359603938s"}
2025-07-01T12:12:58Z	INFO	bandwidth	Persisting bandwidth usage cache to db	{"Process": "storagenode"}
2025-07-01T12:12:58Z	INFO	pieces:trash	emptying trash started	{"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2025-07-01T12:12:58Z	INFO	pieces	used-space-filewalker started	{"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2025-07-01T12:12:58Z	INFO	piecemigrate:chore	all enqueued for migration; will sleep before next pooling	{"Process": "storagenode", "active": {"121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6": false, "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S": false, "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs": false, "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE": false}, "interval": "10m0s"}
2025-07-01T12:12:58Z	INFO	pieces	used-space-filewalker completed	{"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Lazy File Walker": false, "Total Pieces Size": 260880928000, "Total Pieces Content Size": 260307132160, "Total Pieces Count": 1120695, "Duration": "279.404497ms"}
2025-07-01T12:12:58Z	INFO	pieces	used-space-filewalker started	{"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2025-07-01T12:12:58Z	INFO	pieces	used-space-filewalker completed	{"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Lazy File Walker": false, "Total Pieces Size": 49245677568, "Total Pieces Content Size": 49138321920, "Total Pieces Count": 209679, "Duration": "22.486898ms"}
2025-07-01T12:12:58Z	INFO	pieces	used-space-filewalker started	{"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2025-07-01T12:12:58Z	INFO	pieces	used-space-filewalker completed	{"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Lazy File Walker": false, "Total Pieces Size": 1030640089344, "Total Pieces Content Size": 1028754478848, "Total Pieces Count": 3682833, "Duration": "91.074319ms"}

When i have the time i check nodes logs after a manual start for errors and warnings, but this line has only INFO level, so i missed them.

arrogantrabbit · July 15, 2025, 6:15am

I’ve just checked, I have kern.ipc.maxsockbuf = 67108864 (64MB) set since forever. Becasue why not? On the other hand I have no isssues with QUIC on any of my nodes after doign two things:

explicitly binding to the specific interface
maintaining port number thougout the transaltion.

But again, if QUIC is slower in STORJ circumstances, it better be disabled (or left happily malfunctioning – but I’d assuem this wastes a lot of resources on the client)

snorkel · July 15, 2025, 8:38am

Increasing the buffers to 7.5MB didn’t change anything for my nodes.
Synology DS216+ with Celeron N3050 and 8GB RAM - QUIC misscofigured… still. AR’s workaround didn’t help either. In the begining, before that famous update to QUIC dependencies, it was working.
The other setups work as before, QUIC OK:
Synology DS220+ with Celeron J4025 and 18GB RAM,
n100 + 32GB RAM + Ubuntu server.

minerone · July 15, 2025, 2:08pm

My SNO runs on Windows. Why is Quic on version 1.130.8 more stable (no issues) than on version 1.131 ?
I’m still using version 1.130.8.

arrogantrabbit · July 15, 2025, 3:36pm

It’s not more stable. It’s just different and happens to work in your environment. Did you bind to an interface and fix the port?

minerone · July 15, 2025, 4:03pm

QUIC connections are a headache, sometimes working on one version and not on another. From the beginning, I only changed the TCP and UDP forwarding.

arrogantrabbit · July 15, 2025, 4:31pm

Yes, this has to do with the nature of the quic. Did you implement the three recommendations mentioned above and still see issues?

minerone · July 15, 2025, 4:47pm

Set buffer, no
bind, no
don’t change port, yes

quic is really weird

arrogantrabbit · July 15, 2025, 4:56pm

Not really weird.

Consider this: QUIC is a whole new protocol based on UDP. Think, what happens when server needs to send the datagram but it has 7 interfaces? Which interface shall datagrams go through? It does not know that that packet is part of QUIC higher level protocol. It’s just a UDP packet. That’s why setting interface explicitly helps. You can see this for yourself with tcpdump.

Keeping port the same throughout the forward chain probably helps with QUIC path validation.

There is no reason to attempt to discuss or troubleshoot this any further until those configurations are made.

alpharabbit · July 15, 2025, 6:15pm

I tried both buffer size and explicit binding for my windows nodes but it did not help with quic. Sometimes it says OK but most time Misconfigured. Can’t see any pattern.

I second the removal of this alert from GUI until QUIC is really used.