V1.6.4 too many open files bug

champmine18 · July 17, 2020, 10:50am

my storagenode operating on port 20858 after some time produces too many open files and dies; restarts; then after some time again same error

FATAL Unrecoverable error {“error”: “accept tcp [::]:20858: accept4: too many open files; accept tcp [::]:20858: accept4: too many open files”, “errorVerbose”: “group:\n— accept tcp [::]:20858: accept4: too many open files\n\tstorj.io/drpc/drpcserver.(*Server).Serve:92\n\tstorj.io/storj/pkg/server.(*Server).Run.func4:198\n\tgolang.org/x/sync/errgro
up.(*Group).Go.func1:57\n— accept tcp [::]:20858: accept4: too many open files”}

Is there a possibility to check if that is related to a golang bug leaving fds open indefinitely?

stefanbenten · July 17, 2020, 4:50pm

Hey @champmine18,

can you tell me a little bit more around your setup?

Did you change anything from the default config?
Whats your bandwidth available?
Where are you located?
What kind of machine do you run the node on?
Does the node serve any other service/purpose besides the storj storagenode?
Are you perhaps running more than 1 node on that machine?

Can you also provide an output of ulimit -a here?

Typically this error only occurs if your system is overloaded with requests and cannot handle it quickly enough so that it stalls and opens lots of files and TCP connections.

Thanks!

champmine18 · July 17, 2020, 6:27pm

Hi @stefanbenten

I am honored to hear from you, thanks

no changes to the default config
about 80 Mbit up/down
Germany
ubuntu docker
no other service, just storagenode, but multiple of them
yes, running more than one node on that machine

ulimit -a

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 62814
max locked memory (kbytes, -l) 16384
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 62814
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

stefanbenten · July 17, 2020, 6:30pm

When running more than 1 node on that system, the file handle limit is quickly becoming a problem.
You have two choices, either increase that limit or my recommendation, run fewer but bigger nodes.

stuberman · July 17, 2020, 7:01pm

To read the current system level run:

root# systemctl show --property DefaultLimitNOFILE

It sounds like the solution is to modify

/etc/systemd/system.conf

And add a value to the “#DefaultLimitNOFILE=” line such as “DefaultLimitNOFILE=65536”

And

/etc/sysctl.conf
And add a value to the end of the file “fs.file-max = 65536”

stefanbenten · July 17, 2020, 7:21pm

We have the current limit above. Just increasing it to such high values is not a good choice unless you are absolutely sure about the consequences…

champmine18 · July 17, 2020, 7:23pm

well, the problem is that the machine has been running stably for more than 6 months, but now some of the nodes on that machine get errors.

Thanks for the hints, will try to play with those config values