Processes-exit-eventlistener to die

Hello,
One of my nodes stopped working. I haven’t changed anything. Even did not log in to that machine for a long time. Reboot, restart, docker stop, remove container, restart does not help. Even tried to clean up docker with prune system, volume, network and etc. Getting same error while looking at the dasboard:

2023-05-31T08:02:54.697Z INFO Configuration loaded {“Process”: “storagenode”, “Location”: “/app/config/config.yaml”}
2023-05-31T08:02:54.698Z INFO Invalid configuration file key {“Process”: “storagenode”, “Key”: “server.private-address”}
2023-05-31T08:02:54.698Z INFO Invalid configuration file key {“Process”: “storagenode”, “Key”: “operator.wallet”}
2023-05-31T08:02:54.698Z INFO Invalid configuration file key {“Process”: “storagenode”, “Key”: “operator.email”}
2023-05-31T08:02:54.699Z INFO Invalid configuration file key {“Process”: “storagenode”, “Key”: “server.address”}
2023-05-31T08:02:54.699Z INFO Invalid configuration file key {“Process”: “storagenode”, “Key”: “storage.allocated-bandwidth”}
2023-05-31T08:02:54.699Z INFO Invalid configuration file key {“Process”: “storagenode”, “Key”: “storage.allocated-disk-space”}
2023-05-31T08:02:54.700Z INFO Invalid configuration file key {“Process”: “storagenode”, “Key”: “contact.external-address”}
2023-05-31T08:02:54.701Z INFO Anonymized tracing enabled {“Process”: “storagenode”}
2023-05-31T08:02:54.744Z INFO Identity loaded. {“Process”: “storagenode”, “Node ID”: “xxxxxxxxxxxx”}
Error: rpc: dial tcp 127.0.0.1:7778: connect: connection refused

in the log i see this:

2023-05-31T07:55:09.158Z INFO Current binary version {“Process”: “storagenode-updater”, “Service”: “storagenode”, “Version”: “v1.78.3”}
2023-05-31T07:55:09.158Z INFO New version is being rolled out but hasn’t made it to this node yet {“Process”: “storagenode-updater”, “Service”: “storagenode”}
2023-05-31T07:55:09.188Z INFO Current binary version {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”, “Version”: “v1.78.3”}
2023-05-31T07:55:09.188Z INFO New version is being rolled out but hasn’t made it to this node yet {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”}
2023-05-31 07:55:10,190 INFO success: processes-exit-eventlistener entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-05-31 07:55:10,191 INFO success: storagenode entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-05-31 07:55:10,191 INFO success: storagenode-updater entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Error: Error starting master database on storagenode: database: bandwidth opening file “config/storage/bandwidth.db” failed: disk I/O error: input/output error
storj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:331
storj.io/storj/storagenode/storagenodedb.(*DB).openExistingDatabase:308
storj.io/storj/storagenode/storagenodedb.(*DB).openDatabases:283
storj.io/storj/storagenode/storagenodedb.OpenExisting:250
main.cmdRun:62
main.newRunCmd.func1:32
storj.io/private/process.cleanup.func1.4:399
storj.io/private/process.cleanup.func1:417
github.com/spf13/cobra.(*Command).execute:852
github.com/spf13/cobra.(*Command).ExecuteC:960
github.com/spf13/cobra.(*Command).Execute:897
storj.io/private/process.ExecWithCustomOptions:113
main.main:29
runtime.main:250
2023-05-31 07:55:41,143 INFO exited: storagenode (exit status 1; not expected)
2023-05-31 07:55:42,154 INFO spawned: ‘storagenode’ with pid 42
2023-05-31 07:55:42,157 WARN received SIGQUIT indicating exit request
2023-05-31 07:55:42,160 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2023-05-31T07:55:42.160Z INFO Got a signal from the OS: “terminated” {“Process”: “storagenode-updater”}
2023-05-31 07:55:42,169 INFO stopped: storagenode-updater (exit status 0)
2023-05-31T07:55:42.332Z INFO Configuration loaded {“Process”: “storagenode”, “Location”: “/app/config/config.yaml”}
2023-05-31T07:55:42.333Z INFO Anonymized tracing enabled {“Process”: “storagenode”}
2023-05-31T07:55:42.355Z INFO Operator email {“Process”: “storagenode”, “Address”: “xxxxxxxxt"}
2023-05-31T07:55:42.355Z INFO Operator wallet {“Process”: “storagenode”, “Address”: “xxxxxxxxxxxxxxxxxxxx”}
2023-05-31 07:55:45,361 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-05-31 07:55:48,367 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-05-31 07:55:51,371 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-05-31 07:55:52,373 WARN killing ‘storagenode’ (42) with SIGKILL
2023-05-31 07:55:52,377 INFO stopped: storagenode (terminated by SIGKILL)
2023-05-31 07:55:52,379 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)

Tried to look for the same error here on the forum, but did not found anything helpful.
Thank you.

Error: Error starting master database on storagenode: database: bandwidth opening file “config/storage/bandwidth.db” failed: disk I/O error: input/output error

Check your disk

2 Likes

I’ve run e2fsck, did not helped.
Can i rebuild this db?

How is your disk connected ?

USB or Sata. If its USB then try changing USB ports. If it still doesn’t work then try changing USB cable.

Elaborate here please. How did you run it? With -f and and -ck? Did it fix issues or not find any?

Perhaps you need to use a specific tool. What’s you filesystem?

df -T --si

i’m on ubuntu.
Should i try to rebuilt DB standard way?

it looks like this is physical fault of the drive. At the moment it is somehow alive again, and data is copying to the new drive. Probably together with corrupted DB.

There was much more difficoult turorial on rebuilding DB.
Now i see very simple one. Is it a new one? Smarter one? Ar there are two different of them?

If you have a malformed database, you have two options:

  1. try to recover the malformed database (keeping most of the Stat and historic data)
  2. re-create the malformed database (lose the Stat and historic data)

For the “file is not a database” error you have only one option - to re-create a database.

2 Likes

So the copying is done. Went with some errors (on bandwidth.db) BUT… the node from new HDD started with no problem at all. It looks all OK and running fine :slight_smile: So at the moment i believe i will not mess around with recovery or re-create. Thank you for your help.

It’s better to fix the database or re-create, following the articles above, otherwise your dashboard would not work normally, and even suspension score could be affected.

1 Like