TL;DR: tcp_mem is set at boot by the kernel, and is wholly dependent upon your systems ram. increase the ram and it will probably go away. then begins the wompus humt. here be dragons.
i went through this step a few months ago, and just randomly found this thread whilst randomly searching for ‘consider tuning tcp_mem’.
i work for uc berkeley and run a ~14k users/semester jupyterhub deployment (~20 hubs), so we get a loooot of network traffic. we started seeing this message after scaling up to this number of users, and it would end up taking the entire system down. this is BAD.
after a little re-architecting (aka isolate the pods by workload, effectively isolating the issue) we decided to address this vs tracking the port leak in some nodejs package that was the dep of a dep of our ingress controllers. anyways.
this article is from red hat, but i did the math on both the ubuntu 22.04LTS and gcp container os and my results matched (almost perfectly) what was expected.
the link will probably take you to their search page, but just query ‘tcp_mem’ is it’s the first result. the article is called ’ How net.ipv4.tcp_mem value calculated?': How net.ipv4.tcp_mem value calculated? - Red Hat Customer Portal
secondly, i just discovered this post, which offers up some good insights in to the sysctl settings used for high usage prod services (RIP lastfm): Linux Kernel Tuning — Russ Garrett
this is almost 15 years old but still remains pertinent.
if folks are curious, i can share the settings that we have deployed. after reading russ garrett’s lastfm blog, i think that i need to rethink what we’re doing. there was a LOT of hand-waving when coming up with these numbers… and i also hadn’t figured out exactly how tcp_mem was set after committing!
(edit: i can’t post 3 links as a new user but i’d be happy to share my incorrect settings in a comment)
anyways, we were running our jupyterhub core pods and the corresponding (1:1) configurable-http-proxy pods on the same GCP node (user pods were located elsewhere). moving from a GCP n2-standard-8 to n2-highmem-8 (32 to 64G) doubled our tcp_mem and put an end to mid-semester outages. omg. so happy.
it didn’t fix the problem but at least the smaller outages means more time to debug.
hope this helps!