StorJ nodes stuck in loop - too many open files

Started seeing this a couple days ago, figure it’s probably related to the announcement here: Updates on Test Data

Host is running Debian linux 12 on an i7-5960X with 64GB ram, has an ssd as the OS drive and a pair of 12TB hdds connected to the sata3 interface for StorJ data.

[additional note] The nodes are using about 350GB and 450GB each out of an available 10TB of space, and the OS disk has only used about 40% of it’s available space.

There are two StorJ docker nodes running (along with the watchtower and multinode containers, ) each storj node saving to it’s own 12TB hdd. These are the log snippets from one of the nodes, but I’m seeing the exact same loop of log entries on both of them.

Regardless of whether I start one or both node containers, they run for a couple minutes and then drop into this loop of crash and restart. This loop repeats itself with the same exact info until I stop the container. I can pull a longer, more complete log if this doesn’t have enough information to be useful.

2024-05-01 16:02:03,496 INFO RPC interface 'supervisor' initialized                                                                                                                                   
2024-05-01 16:02:03,497 INFO supervisord started with pid 1                                                                                                                                           
2024-05-01 16:02:04,500 INFO spawned: 'processes-exit-eventlistener' with pid 12                                                                                                                      
2024-05-01 16:02:04,502 INFO spawned: 'storagenode' with pid 13                                                                                                                                       
2024-05-01 16:02:04,504 INFO spawned: 'storagenode-updater' with pid 14                                                                                                                               
2024-05-01T16:02:04Z    INFO    Configuration loaded    {"Process": "storagenode-updater", "Location": "/app/config/config.yaml"}                                                                     
2024-05-01T16:02:04Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "operator.email"}                                                                           
2024-05-01T16:02:04Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "operator.wallet-features"}                                                                 
2024-05-01T16:02:04Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "console.address"}                                                                          
2024-05-01T16:02:04Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "healthcheck.details"}                                                                      
2024-05-01T16:02:04Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "storage.allocated-bandwidth"}                                                              
2024-05-01T16:02:04Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "server.private-address"}                
2024-05-01T16:02:04Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "server.address"} 
2024-05-01T16:02:04Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "contact.external-address"}           
2024-05-01T16:02:04Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "healthcheck.enabled"}
2024-05-01T16:02:04Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "storage.allocated-disk-space"}
2024-05-01T16:02:04Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "operator.wallet"}
2024-05-01T16:02:04Z    INFO    Anonymized tracing enabled      {"Process": "storagenode-updater"}
2024-05-01T16:02:04Z    INFO    Running on version      {"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.102.3"}
2024-05-01T16:02:04Z    INFO    Downloading versions.   {"Process": "storagenode-updater", "Server Address": "https://version.storj.io"}
2024-05-01T16:02:04Z    INFO    Configuration loaded    {"Process": "storagenode", "Location": "/app/config/config.yaml"}                                                                             2024-05-01T16:02:04Z    INFO    Anonymized tracing enabled      {"Process": "storagenode"}                                                                                                            2024-05-01T16:02:04Z    INFO    Operator email  {"Process": "storagenode", "Address": "i80+storj1@unleashedthoughts.com"}                                                                             2024-05-01T16:02:04Z    INFO    Operator wallet {"Process": "storagenode", "Address": "0x61fef7c3aac235b029cec0accde2a3848b7b904c"}                                                                   2024-05-01T16:02:04Z    ERROR   failure during run      {"Process": "storagenode", "error": "Error opening database on storagenode: database: used_space_per_prefix opening file \"config/storage/used_space_per_prefix.db\" failed: unable to open database file: too many open files\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:364\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openExistingDatabase:341\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openDatabases:316\n\tstorj.io/storj/storagenode/storagenodedb.OpenExisting:281\n\tmain.cmdRun:65\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:267", "errorVerbose": "Error opening database on storagenode: database: used_space_per_prefix opening file \"config/storage/used_space_per_prefix.db\" failed: unable to open database file: too many open files\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:364\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openExistingDatabase:341\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openDatabases:316\n\tstorj.io/storj/storagenode/storagenodedb.O
penExisting:281\n\tmain.cmdRun:65\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:
983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:267\n\tm
ain.cmdRun:67\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/sp
f13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:267"}
Error: Error opening database on storagenode: database: used_space_per_prefix opening file "config/storage/used_space_per_prefix.db" failed: unable to open database file: too many open files
        storj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:364
        storj.io/storj/storagenode/storagenodedb.(*DB).openExistingDatabase:341
        storj.io/storj/storagenode/storagenodedb.(*DB).openDatabases:316
        storj.io/storj/storagenode/storagenodedb.OpenExisting:281
        main.cmdRun:65                                                                             
        main.newRunCmd.func1:33                                                                    
        storj.io/common/process.cleanup.func1.4:393   
        storj.io/common/process.cleanup.func1:411        
        github.com/spf13/cobra.(*Command).execute:983
        github.com/spf13/cobra.(*Command).ExecuteC:1115
        github.com/spf13/cobra.(*Command).Execute:1039                        
        storj.io/common/process.ExecWithCustomOptions:112      
        main.main:34                                                                                                                                                                                  
        runtime.main:267                                                                           
2024-05-01 16:02:04,665 INFO exited: storagenode (exit status 1; not expected) 

I’ve been running storage nodes for years and haven’t seen any issues with too many files being opened aside from this one: V1.6.4 too many open files bug
While that is similar, doesn’t seem to be quite the same.

Any thoughts?

1 Like

You dont think the earlier errors about the invalid configuration keys isn’t the issue?

I hadn’t looked too closely at them since they were marked as ‘info’ and not ‘warn’ or ‘error’.

After rebooting the host both nodes will start up successfully, fully operational web interface and all, and run for about 15 minutes before dropping into this loop

It also doesn’t seem to be limited to just storj. While the nodes are running in this loop I’m also unable to open or create any new files even on the host OS drive; I get a warning from Debian that there are too many open files.

As soon as I stop the storj containers then the OS is back to normal where I can create and edit files.

I think the answer is here in Stephen’s reply.

Alrighty, that’s a different approach than what I’ve already tried. I’ll try applying that when I get back and will update this post with the results.

2 Likes

This is a bug and needs to be fixed.

Storagenode updater uses the same config file as storagenode, but it does not understand (nor needs to!) half of the keys there. Instead of ignoring them, it screams in the logs.

That log message shall be demoted from INFO to DEBUG, or removed altogether. It shall only pay attention to keys it actually going to use.

4 Likes

The part I’m getting hung up on with that post you linked, is that (as far as I can tell) I’m not using any sytemd service to run/manage the docker container… It’s just a basic docker install with no dedicated docker/storj user account. So it seems to me that I should be changing the NOFile limit for the user account itself…

You can try to run the container with --ulimit parameter:

docker run --ulimit nofile=5000:5000

I re-ran my two nodes with that parameter and confirmed with docker -exec that it did update the container’s NOFile limits, however they are still exhibiting the same behavior and outputting the same log messages

Well, you could try raising the number higher. If not, I would recommend asking in a Docker forum for assistance as it seems to be Docker related.

1 Like