Changes to the UDP buffer size for QUIC needed!

Hello

Today i noticed a new INFO line at node startup in the logs.

2025-07-14T14:30:37Z    INFO    failed to sufficiently increase receive buffer size
 (was: 208 kiB, wanted: 7168 kiB, got: 4096 kiB).
 See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.

GitHub

Following the text on GitHub i changed the size of the buffer(s) and restarted the node, the INFO line was gone.

The change seems to work.

IIRC it has been there for more than a year now :slight_smile: I am glad you got it working.

1 Like

There is also one line for TCP fastopen.

2 Likes

Well it’s nice to see that people read documentation! Kudos to @Krawi

Which is arguably much more impactful.
Nobody uses QUIC, nobody cares about QUIC, QUIC is dead.

But everyone [attempts to] use[s] TFO, and if your node does not respond to it – well, you lose more races.

I’m wondering, is it possible to disable support for QUIC in the storagenode? Is it advisable? is it possible? I’ll go read documentation.

Better yet – perhaps storj can consider switching to a different QUIC go library? The one currently in use is evidently pretty unstable in practice. Maybe then usage will increase?

Maybe quit using third party libraries for an unused function, that can be used in supply chain attacks, or however it is called…

1 Like

I agree that shit that is not used shall be complied out as much as possible. But I’m still not convinced that the problem is with quick library and not in how storj is using and/or configuring it.

On paper QUIC shall be more performant and stable compared to TFO, it has useful features like 0-RTT connections (TFO just reduces rountrip by 1 for subsequent connections), supports multi-streams, and benefits from testing due to wide adoption.

quic-go is the most popular and therefore stable one. There are other implementations but they are more low level. I think there is even one from Microsoft.

That said, it’s totally possible the issues can be outside QUIC and be not solvable, especially with scenarios where the server is behind nat on a device with multiple network interfaces: that’s why binding to specific server address and keeping port consistent across DNATs solves the QUICK Misconfigured in vast majority of cases.

Relevant:

Some time ago I stumbled on this paper, suggesting that there are general quality of implementation issues coming from the network stack itself that affect QUIC. As in, there was so much effort spent on optimizing regular TCP that a good implementation of a TCP-based protocol beats a standard implementation of a UDP-based protocol.

1 Like

We have it in the documentation:

Yes, it’s in the docs, but look at the suggested number for the buffer size.

sysctl -w net.core.rmem_max=2500000

That are ~2.5MB and the current version of the node requests now 7MB!

The post on GitHub suggests ~7.5MB and that works fine.

Doesn’t the Storj Docs needs an update to reflect this?

Maybe. Do you have many of such messages?
I found only two on my biggest node.

Not many, they appear only at node startup, like here (Line #17):

2025-07-01T12:12:40Z	INFO	Configuration loaded	{"Process": "storagenode", "Location": "/app/config/config.yaml"}
2025-07-01T12:12:40Z	INFO	Anonymized tracing enabled	{"Process": "storagenode"}
2025-07-01T12:12:40Z	INFO	Operator email	{"Process": "storagenode", "Address": "<eMail>"}
2025-07-01T12:12:40Z	INFO	Operator wallet	{"Process": "storagenode", "Address": "<Wallet>"}
2025-07-01T12:12:45Z	INFO	server	existing kernel support for server-side tcp fast open detected	{"Process": "storagenode"}
2025-07-01T12:12:48Z	INFO	hashstore	hashstore opened successfully	{"Process": "storagenode", "satellite": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "open_time": "2.616363368s"}
2025-07-01T12:12:53Z	INFO	hashstore	hashstore opened successfully	{"Process": "storagenode", "satellite": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "open_time": "5.060323016s"}
2025-07-01T12:12:56Z	INFO	hashstore	hashstore opened successfully	{"Process": "storagenode", "satellite": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "open_time": "3.354724116s"}
2025-07-01T12:12:57Z	INFO	Telemetry enabled	{"Process": "storagenode", "instance ID": "1DghwxQbwJHrCkWSqnKThZFs2VSCEUjL6KGFhAD9SyeXagXJeL"}
2025-07-01T12:12:57Z	INFO	Event collection enabled	{"Process": "storagenode", "instance ID": "1DghwxQbwJHrCkWSqnKThZFs2VSCEUjL6KGFhAD9SyeXagXJeL"}
2025-07-01T12:12:57Z	INFO	db.migration	Database Version	{"Process": "storagenode", "version": 62}
2025-07-01T12:12:57Z	INFO	preflight:localtime	start checking local system clock with trusted satellites' system clock.	{"Process": "storagenode"}
2025-07-01T12:12:58Z	INFO	preflight:localtime	local system clock is in sync with trusted satellites' system clock.	{"Process": "storagenode"}
2025-07-01T12:12:58Z	INFO	Node 1DghwxQbwJHrCkWSqnKThZFs2VSCEUjL6KGFhAD9SyeXagXJeL started	{"Process": "storagenode"}
2025-07-01T12:12:58Z	INFO	Public server started on [::]:28967	{"Process": "storagenode"}
2025-07-01T12:12:58Z	INFO	Private server started on 127.0.0.1:7778	{"Process": "storagenode"}
2025-07-01T12:12:58Z	INFO	failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 4096 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.	{"Process": "storagenode"}
2025-07-01T12:12:58Z	INFO	trust	Scheduling next refresh	{"Process": "storagenode", "after": "5h36m5.359603938s"}
2025-07-01T12:12:58Z	INFO	bandwidth	Persisting bandwidth usage cache to db	{"Process": "storagenode"}
2025-07-01T12:12:58Z	INFO	pieces:trash	emptying trash started	{"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2025-07-01T12:12:58Z	INFO	pieces	used-space-filewalker started	{"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2025-07-01T12:12:58Z	INFO	piecemigrate:chore	all enqueued for migration; will sleep before next pooling	{"Process": "storagenode", "active": {"121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6": false, "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S": false, "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs": false, "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE": false}, "interval": "10m0s"}
2025-07-01T12:12:58Z	INFO	pieces	used-space-filewalker completed	{"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Lazy File Walker": false, "Total Pieces Size": 260880928000, "Total Pieces Content Size": 260307132160, "Total Pieces Count": 1120695, "Duration": "279.404497ms"}
2025-07-01T12:12:58Z	INFO	pieces	used-space-filewalker started	{"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2025-07-01T12:12:58Z	INFO	pieces	used-space-filewalker completed	{"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Lazy File Walker": false, "Total Pieces Size": 49245677568, "Total Pieces Content Size": 49138321920, "Total Pieces Count": 209679, "Duration": "22.486898ms"}
2025-07-01T12:12:58Z	INFO	pieces	used-space-filewalker started	{"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2025-07-01T12:12:58Z	INFO	pieces	used-space-filewalker completed	{"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Lazy File Walker": false, "Total Pieces Size": 1030640089344, "Total Pieces Content Size": 1028754478848, "Total Pieces Count": 3682833, "Duration": "91.074319ms"}

When i have the time i check nodes logs after a manual start for errors and warnings, but this line has only INFO level, so i missed them.

I’ve just checked, I have kern.ipc.maxsockbuf = 67108864 (64MB) set since forever. Becasue why not? On the other hand I have no isssues with QUIC on any of my nodes after doign two things:

  • explicitly binding to the specific interface
  • maintaining port number thougout the transaltion.

But again, if QUIC is slower in STORJ circumstances, it better be disabled (or left happily malfunctioning – but I’d assuem this wastes a lot of resources on the client)

Increasing the buffers to 7.5MB didn’t change anything for my nodes.
Synology DS216+ with Celeron N3050 and 8GB RAM - QUIC misscofigured… still. AR’s workaround didn’t help either. In the begining, before that famous update to QUIC dependencies, it was working.
The other setups work as before, QUIC OK:
Synology DS220+ with Celeron J4025 and 18GB RAM,
n100 + 32GB RAM + Ubuntu server.

My SNO runs on Windows. Why is Quic on version 1.130.8 more stable (no issues) than on version 1.131 ?
I’m still using version 1.130.8.

It’s not more stable. It’s just different and happens to work in your environment. Did you bind to an interface and fix the port?

QUIC connections are a headache, sometimes working on one version and not on another. From the beginning, I only changed the TCP and UDP forwarding.

Yes, this has to do with the nature of the quic. Did you implement the three recommendations mentioned above and still see issues?

  • Set buffer, no
  • bind, no
  • don’t change port, yes

quic is really weird

Not really weird.

Consider this: QUIC is a whole new protocol based on UDP. Think, what happens when server needs to send the datagram but it has 7 interfaces? Which interface shall datagrams go through? It does not know that that packet is part of QUIC higher level protocol. It’s just a UDP packet. That’s why setting interface explicitly helps. You can see this for yourself with tcpdump.

Keeping port the same throughout the forward chain probably helps with QUIC path validation.

There is no reason to attempt to discuss or troubleshoot this any further until those configurations are made.

I tried both buffer size and explicit binding for my windows nodes but it did not help with quic. Sometimes it says OK but most time Misconfigured. Can’t see any pattern.

I second the removal of this alert from GUI until QUIC is really used.