Keep getting the following error when starting the docker node:
2023-11-14T21:35:52Z INFO Configuration loaded {“process”: “storagenode”, “Location”: “/app/config/config.yaml”}
2023-11-14T21:35:52Z INFO Anonymized tracing enabled {“process”: “storagenode”}
2023-11-14T21:35:52Z INFO Operator email {“process”: “storagenode”, “Address”: “”}
2023-11-14T21:35:52Z INFO Operator wallet {“process”: “storagenode”, “Address”: “0xEDA46769ac169097A0A90F355a92BB4098B04334”}
2023-11-14T21:35:52Z INFO server kernel support for server-side tcp fast open remains disabled. {“process”: “storagenode”}
2023-11-14T21:35:52Z INFO server enable with: sysctl -w net.ipv4.tcp_fastopen=3 {“process”: “storagenode”}
2023-11-14T21:35:52Z INFO Telemetry enabled {“process”: “storagenode”, “instance ID”: “12h7iT8pxEJiQkkTRaQFvUomifVD7mRWosYuNwV4TkB7d4JZvAr”}
2023-11-14T21:35:52Z INFO Event collection enabled {“process”: “storagenode”, “instance ID”: “12h7iT8pxEJiQkkTRaQFvUomifVD7mRWosYuNwV4TkB7d4JZvAr”}
2023-11-14T21:35:52Z FATAL Unrecoverable error {“process”: “storagenode”, “error”: “Error creating tables for master database on storagenode: migrate: database: info opening file "config/storage/info.db" failed: unable to open database file: no such file or directory\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:347\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).Migration.func1:657\n\tstorj.io/storj/private/migrate.(*Migration).Run:164\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).MigrateToLatest:370\n\tmain.cmdRun:95\n\tmain.newRunCmd.func1:32\n\tstorj.io/private/process.cleanup.func1.4:402\n\tstorj.io/private/process.cleanup.func1:420\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomOptions:113\n\tmain.main:30\n\truntime.main:267”, “errorVerbose”: “Error creating tables for master database on storagenode: migrate: database: info opening file "config/storage/info.db" failed: unable to open database file: no such file or directory\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:347\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).Migration.func1:657\n\tstorj.io/storj/private/migrate.(*Migration).Run:164\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).MigrateToLatest:370\n\tmain.cmdRun:95\n\tmain.newRunCmd.func1:32\n\tstorj.io/private/process.cleanup.func1.4:402\n\tstorj.io/private/process.cleanup.func1:420\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomOptions:113\n\tmain.main:30\n\truntime.main:267\n\tmain.cmdRun:97\n\tmain.newRunCmd.func1:32\n\tstorj.io/private/process.cleanup.func1.4:402\n\tstorj.io/private/process.cleanup.func1:420\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomOptions:113\n\tmain.main:30\n\truntime.main:267”}
All 3 nodes I am running are identical configuration wise aside from ports, and their mount points. The mount point of this node didn’t experience any errors, and I am able to read and write from the drive without an issue.
The file it’s complaining about, info.db, is there:
[root@jonesmedia storage]# pwd
/mnt/StorjData/storage/storage
[root@jonesmedia storage]# ls -al info.db
-rwxr-xr-x 1 root root 16384 Nov 14 01:17 info.db
Are you sure, your drives are connected and accessible for the docker process?
It seems like your files aren’t there.
Do you run your docker as root (assuming you run docker)?
And are you really sure, your mount points you’re able to access are the same mount points the storagenode is using? So, are you using the right arguments to the docker process (assung you run docker)?
What file system are you using? And in what way are they connected (USB, SATA, NVMe…?).
Perhaps you should show the script you’re using.
Further, you could search for “Error creating tables for master database on storagenode” in the forum. There are plenty topics like this one.
Yes, I am running it as root, and the filesystem is XFS.
I’ll do some more searching on the forum. This issue makes no damn sense. Nothing changed before it went down last night. It just… went down and now refuses to start back up saying it can’t find the file.
The filesystem/mount is clean. I am able to read/write to it without an issue, and it’s XFS so there’s not really a check you can run on it.
I made the requested change in the compose file but it didn’t make a difference. There is one change though, it’s complaining about the secrets.db file not being there now. Same “File not found” error.
However, if I enter the container, I can copy the same dang file it’s saying isn’t there!
Why the heck would it say the file isn’t there, if I can even copy it? This makes no sense to me. To reiterate, I have two other storj nodes running on this same physical hardware, same exact config and compose files except for the mountpoints, working just fine.
2023-11-15T05:21:49Z FATAL Unrecoverable error {“process”: “storagenode”, “error”: "Error starting master database on storagenode: database: secret opening file "config/storage/secret.db" failed: unable to open database file: no such file or directory
I got something similar yesterday with one of my nodes. All of a sudden the node went offline on UptimeRobot.
I checked the log and it was complaining about not being able to open revocations.db
Looks like my mount point became somehow read-only. I stopped the node, unmounted the drive, mounted it back with rw permissions and started the node. Node started normally.
Hope it helps.
Thank you! I didn’t realize xfs_repair would actually check things. I have it running now. I don’t expect it to find anything wrong as I haven’t seen anything in the syslogs indicated a filesystem issue, and I can access the files inside the container, but I’ll report back once it’s done.
I could have been more clear about the config, sorry about that. The logs are writing to their own separate folders, so I am accessing the correct ones.
Are there any debug flags I can enable to be able to see the full path the process is trying to find the files with?
xfs_repair finished and didn’t indicate any errors:
[root@jonesmedia storj]# xfs_repair /dev/sdd
Phase 1 - find and verify superblock…
Phase 2 - using internal log
- zero log…
- scan filesystem freespace and inode maps…
- found root inode chunk
Phase 3 - for each AG…
- scan and clear agi unlinked lists…
- process known inodes and perform inode discovery…
- agno = 0
^A - agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes…
Phase 4 - check for duplicate blocks…
- setting up duplicate extent list…
- check for inodes claiming duplicate blocks…
- agno = 0
- agno = 1
- agno = 2
- agno = 3
Phase 5 - rebuild AG headers and trees…
- reset superblock…
Phase 6 - check inode connectivity…
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem …
- traversal finished …
- moving disconnected inodes to lost+found …
Phase 7 - verify and correct link counts…
done
[root@jonesmedia storj]#
However, starting the node… and it stays up now and can apparently see all the database files now!
What in the world? The file it was complaining about not being able to see, that I could access just fine inside the container manually, apparently can be found now after xfs_repair finished with no errors being found by the utility.
Most likely xfs_repair unmounts file system before repair and re-mounts it after repair.
Same steps (unmount/re-mount) helped me to bring my node online.
stopped the node
$ sudo umount /media/storj/storj02
scanned filesystem, found some errors
$ sudo fsck.ext4 -p /dev/sdb1
B509 contains a file system with errors, check forced.
B509: Inode 131747, i_blocks is 10256, should be 10232. FIXED.
B509: Inode 131762, i_blocks is 1728, should be 1720. FIXED.
B509: 6498271/366215168 files (1.6% non-contiguous), 2689840688/2929720832 blocks
mounted filesystem back
$ sudo mount -a
started the node, so far so good.
What can cause rw filesystem to re-mount on its own as ro?