V1.6.4 too many open files bug

my storagenode operating on port 20858 after some time produces too many open files and dies; restarts; then after some time again same error

FATAL Unrecoverable error {“error”: “accept tcp [::]:20858: accept4: too many open files; accept tcp [::]:20858: accept4: too many open files”, “errorVerbose”: “group:\n— accept tcp [::]:20858: accept4: too many open files\n\tstorj.io/drpc/drpcserver.(*Server).Serve:92\n\tstorj.io/storj/pkg/server.(*Server).Run.func4:198\n\tgolang.org/x/sync/errgro
up.(*Group).Go.func1:57\n— accept tcp [::]:20858: accept4: too many open files”}

Is there a possibility to check if that is related to a golang bug leaving fds open indefinitely?

Hey @champmine18,

can you tell me a little bit more around your setup?

  • Did you change anything from the default config?
  • Whats your bandwidth available?
  • Where are you located?
  • What kind of machine do you run the node on?
  • Does the node serve any other service/purpose besides the storj storagenode?
  • Are you perhaps running more than 1 node on that machine?

Can you also provide an output of ulimit -a here?

Typically this error only occurs if your system is overloaded with requests and cannot handle it quickly enough so that it stalls and opens lots of files and TCP connections.

Thanks!

Hi @stefanbenten

I am honored to hear from you, thanks

  • no changes to the default config
  • about 80 Mbit up/down
  • Germany
  • ubuntu docker
  • no other service, just storagenode, but multiple of them
  • yes, running more than one node on that machine

ulimit -a

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 62814
max locked memory (kbytes, -l) 16384
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 62814
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

When running more than 1 node on that system, the file handle limit is quickly becoming a problem.
You have two choices, either increase that limit or my recommendation, run fewer but bigger nodes. :slight_smile:

To read the current system level run:

root# systemctl show --property DefaultLimitNOFILE

It sounds like the solution is to modify

/etc/systemd/system.conf

And add a value to the “#DefaultLimitNOFILE=” line such as “DefaultLimitNOFILE=65536”

And

/etc/sysctl.conf
And add a value to the end of the file “fs.file-max = 65536”

We have the current limit above. Just increasing it to such high values is not a good choice unless you are absolutely sure about the consequences…

well, the problem is that the machine has been running stably for more than 6 months, but now some of the nodes on that machine get errors.

Thanks for the hints, will try to play with those config values