Error starting node, `Error starting master database [...] heldamount.db failed: file is not a database`

roryboreyalice · April 12, 2022, 8:09pm

Hello,

Having an issue on a node. Noticed today that I couldn’t access the web dashboard. Checked the logs, found what is included below.

The storage node is located at /mnt/storj_03/, its identity is in default directory and is named, storagenode3.

2022-04-12T20:01:55.754Z INFO Current binary version {"Service": "storagenode", "Version": "v1.52.2"} 2022-04-12T20:01:55.755Z INFO Version is up to date {"Service": "storagenode"} 2022-04-12T20:01:55.766Z INFO Current binary version {"Service": "storagenode-updater", "Version": "v1.52.2"} 2022-04-12T20:01:55.766Z INFO Version is up to date {"Service": "storagenode-updater"} 2022-04-12 20:01:56,768 INFO success: processes entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2022-04-12 20:01:56,773 INFO spawned: 'storagenode' with pid 53 2022-04-12 20:01:56,774 INFO success: storagenode-updater entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2022-04-12T20:01:56.837Z INFO Configuration loaded {"Location": "/app/config/config.yaml"} 2022-04-12T20:01:56.839Z INFO Operator email {"Address": "Omitted_for_forum"} 2022-04-12T20:01:56.839Z INFO Operator wallet {"Address": "Omitted_for_forum"} Error: Error starting master database on storagenode: database: heldamount opening file "config/storage/heldamount.db" failed: file is not a database storj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:324 storj.io/storj/storagenode/storagenodedb.(*DB).openExistingDatabase:306 storj.io/storj/storagenode/storagenodedb.(*DB).openDatabases:281 storj.io/storj/storagenode/storagenodedb.OpenExisting:248 main.cmdRun:193 storj.io/private/process.cleanup.func1.4:363 storj.io/private/process.cleanup.func1:381 github.com/spf13/cobra.(*Command).execute:852 github.com/spf13/cobra.(*Command).ExecuteC:960 github.com/spf13/cobra.(*Command).Execute:897 storj.io/private/process.ExecWithCustomConfig:88 storj.io/private/process.ExecCustomDebug:70 main.main:474 runtime.main:255 2022-04-12 20:01:56,855 INFO exited: storagenode (exit status 1; not expected)

A bit unsure what to do here, any advice is appreciated.

shade-bot · April 12, 2022, 9:36pm

Check the file first:

$ ls -la ./storage/heldamount.db
-rwxr-xr-x 1 root root 32768 Apr 12 08:44 ./storage/heldamount.db

What does yours look like?

roryboreyalice · April 12, 2022, 10:19pm

-rw-r--r-- 1 root root 81920 Apr 12 19:27 ./storage/heldamount.db

shade-bot · April 13, 2022, 12:35am

Follow this guide:

roryboreyalice · April 13, 2022, 12:42am

Ran through everything it said, the logs show the following:

2022-04-13T00:41:46.593Z WARN contact:service failed PingMe request to satellite {"Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "error": "ping satellite: check-in network: failed to ping node (ID: 12YCL4XTotnmnmEGT353kdF8NXshLfajujMMsMx6f4LcwnxUN1g) at address: gntl.cash:28969, err: contact: failed to dial storage node (ID: 12YCL4XTotnmnmEGT353kdF8NXshLfajujMMsMx6f4LcwnxUN1g) at address gntl.cash:28969 using QUIC: rpc: quic: timeout: no recent network activity", "errorVerbose": "ping satellite: check-in network: failed to ping node (ID: 12YCL4XTotnmnmEGT353kdF8NXshLfajujMMsMx6f4LcwnxUN1g) at address: gntl.cash:28969, err: contact: failed to dial storage node (ID: 12YCL4XTotnmnmEGT353kdF8NXshLfajujMMsMx6f4LcwnxUN1g) at address gntl.cash:28969 using QUIC: rpc: quic: timeout: no recent network activity\n\tstorj.io/storj/storagenode/contact.(*Service).requestPingMeOnce:194\n\tstorj.io/storj/storagenode/contact.(*Service).RequestPingMeQUIC:167\n\tstorj.io/storj/storagenode.(*Peer).addConsoleService:845\n\tstorj.io/storj/storagenode.(*Peer).Run:884\n\tmain.cmdRun:251\n\tstorj.io/private/process.cleanup.func1.4:363\n\tstorj.io/private/process.cleanup.func1:381\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.ExecCustomDebug:70\n\tmain.main:474\n\truntime.main:255"}

With that…the node is online according to www dashboard, and seems happy. Thoughts?

Edit: Further review of the logs, things seem okay. Perhaps it needed some time? I see typical satellite chatter. Dashboard is reporting expected stats, showing a ton of offline time, so hopefully that corrects itself. Numbers all seem ‘right’.

shade-bot · April 13, 2022, 2:05am

Good to hear your ok now! Any idea why your databases got corrupted? Did you have a power failure or unexpected event?

roryboreyalice · April 13, 2022, 11:28am

Exactly that. Half height PCI card shorted on chassis. System locked up. Had to first e2fsck the disk manually…then ran into the issue in OP.

Thank you for the help, seems happy again.