Computer reboot = node does not restart (Fatal errors, DB incorrect, all fun stuff)

Hi all,

My computer is running 1 node on HDD D:\ on Windows 10. Installed using the Windows GUI, computer rebooted (thanks Windows update) and the node is now corrupted (see below).

  1. After the reboot I checked if the node was running, and it was not so I ran Start-Service storagenode

  2. Node was running few minutes then it stopped, so I checked the logs and it was saying that File was not a DB

  3. I read the docs, installed Sqlite3 and tried to find which ones were corrupted:

Error: file is not a database
bandwidth.db
info.db ok
Error: disk I/O error
notifications.db
Error: file is not a database
orders.db
Error: file is not a database
pieceinfo.db
Error: file is not a database
piece_expiration.db
Error: file is not a database
piece_spaced_used.db
pricing.db ok
Error: database disk image is malformed
reputation.db *** in database main *** Page 2: btreeInitPage() returns error code 11 Page 3: btreeInitPage() returns error code 11
Error: file is not a database
satellites.db
storage_usage.db ok
used_serial.db ok

  1. I checked this forum and saw a thread to backup files and remove them to let the node recreate the file, so I did that:

2021-08-29T21:01:42.597+0700 INFO db database does not exists {“database”: “info”}
2021-08-29T21:01:42.597+0700 INFO db database does not exists {“database”: “bandwidth”}
2021-08-29T21:01:42.597+0700 INFO db database does not exists {“database”: “orders”}
2021-08-29T21:01:42.597+0700 INFO db database does not exists {“database”: “piece_expiration”}
2021-08-29T21:01:42.597+0700 INFO db database does not exists {“database”: “pieceinfo”}
2021-08-29T21:01:42.597+0700 INFO db database does not exists {“database”: “piece_spaced_used”}
2021-08-29T21:01:42.598+0700 INFO db database does not exists {“database”: “reputation”}
2021-08-29T21:01:42.598+0700 INFO db database does not exists {“database”: “storage_usage”}
2021-08-29T21:01:42.598+0700 INFO db database does not exists {“database”: “used_serial”}
2021-08-29T21:01:42.598+0700 INFO db database does not exists {“database”: “satellites”}
2021-08-29T21:01:42.598+0700 INFO db database does not exists {“database”: “notifications”}
2021-08-29T21:01:42.598+0700 INFO db database does not exists {“database”: “heldamount”}
2021-08-29T21:01:42.599+0700 INFO db database does not exists {“database”: “pricing”}
2021-08-29T21:01:42.599+0700 INFO db database does not exists {“database”: “secret”}

  1. Restarted the node Start-Service storagenode

  2. Then new error:

2021-08-29T21:01:44.703+0700 FATAL Unrecoverable error {“error”: "Error creating tables for master database on storagenode: migrate: creating version table failed: migrate: unable to open database file: The handle is invalid…

  1. Stopped the node Stop-Service storagenode

  2. Ok, that’s getting difficult but back to Google, so I found this thread where @Alexey said that orders.db cannot be recreated and this file was corrupted, so I followed this and recreated manually the orders.db:

SQLite version 3.36.0 2021-06-18 18:36:39
Enter “.help” for usage hints.
sqlite> CREATE TABLE unsent_order (
…> satellite_id BLOB NOT NULL,
…> serial_number BLOB NOT NULL,
…> order_limit_serialized BLOB NOT NULL, – serialized pb.OrderLimit
…> order_serialized BLOB NOT NULL, – serialized pb.Order
…> order_limit_expiration TIMESTAMP NOT NULL, – when is the deadline for sending it
…> uplink_cert_id INTEGER NOT NULL,
…> FOREIGN KEY(uplink_cert_id) REFERENCES certificate(cert_id)
…> );
sqlite> CREATE TABLE order_archive_ (
…> satellite_id BLOB NOT NULL,
…> serial_number BLOB NOT NULL,
…> order_limit_serialized BLOB NOT NULL,
…> order_serialized BLOB NOT NULL,
…> uplink_cert_id INTEGER NOT NULL,
…> status INTEGER NOT NULL,
…> archived_at TIMESTAMP NOT NULL,
…> FOREIGN KEY(uplink_cert_id) REFERENCES certificate(cert_id)
…> );
sqlite> CREATE UNIQUE INDEX idx_orders ON unsent_order(satellite_id, serial_number);
sqlite> CREATE TABLE versions (version int, commited_at text);
sqlite> .exit

  1. Then restarted the node Start-Service storagenode

  2. New errors (at least it’s doing something)

2021-08-29T21:27:29.975+0700 INFO trust Scheduling next refresh {“after”: “3h2m56.24716803s”}
2021-08-29T21:27:30.075+0700 ERROR pieces:trash emptying trash failed {“error”: “pieces error: filestore error: open D:\trash\pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa: Access is denied.”, “errorVerbose”: “pieces error: filestore error: open D:\trash\pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa: Access is denied…(*blobStore).EmptyTrash:154…(*BlobsUsageCache).EmptyTrash:310…(*Store).EmptyTrash:367…(*TrashChore).Run.func1:51…(*Cycle).Run:92…(*Cycle).Start.func1:71…(*Group).Go.func1:57”}
2021-08-29T21:27:30.100+0700 ERROR piecestore:cache error getting current used space: {“error”: “open D:\blobs\pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa: Access is denied.; open D:\blobs\ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa: Access is denied.”, “errorVerbose”: “group:\n— open D:\blobs\pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa: Access is denied.\n— open D:\blobs\ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa: Access is denied.”}
2021-08-29T21:27:30.100+0700 ERROR services unexpected shutdown of a runner {“name”: “piecestore:cache”, “error”: “open D:\blobs\pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa: Access is denied.; open D:\blobs\ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa: Access is denied.”, “errorVerbose”: “group:\n— open D:\blobs\pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa: Access is denied.\n— open D:\blobs\ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa: Access is denied.”}

  1. Stopped the node, checked Google again and found this post Error: "open F:\blobs\.. of: Access is denied" where @Alexey asked to check for security access, which I did and nothing wrong IMO:

  1. Ok I’m stuck, I need help.

If you reached up to here and think about something, then do not hesitate.

Thanks

(Sorry new user can only post 1 screenshot)
(Sorry new user can only post 2 links)

Hi @Mickael,
If the node is not very old I would look to start again as it’s unusual to have this many issues. If it’s worth keeping then to me all of your issues all point to problems on the storage drive you are using - databases got malformed as the disk isn’t right and then once that was fixed the permissions seem messed up on the files.

I’d run chkdsk with fix on the drive, if anything is found then run it a few times. Then I’d check the file permissions for the files with the errors:

Hi @Stob ,

Thanks for your reply. Indeed those are a lot issues in a short amount of time.

I wanted to learn how to fix those, but this is not an easy task to be a node operator and this could decrease the adoption of new comers that would grow the network.

I was also hoping for a easy Repair program from the Windows GUI to take care of the most frequent issues and automatically take cares of them, but it seems this does not exist.

I can see a “repair” option in the GUI but it is not activated and I cannot find any doc about it.

image

That would just be a repair on the installation/program files. It wouldn’t do anything for corrupt databases, disk issues or incorrect configuration settings.

It’s been mentioned a few times on the forum but running a node requires some technical knowledge to setup the node and configure the appropriate port forwarding, DNS/IP, ETH wallet, etc. In fact requiring some technical knowledge means the network is more robust as you don’t get as many ‘fly by night’ operators.

2 Likes

@Stob Yes it makes sense, but port forwarding, DNS/IP etc is not very complicated. It’s more time consuming for issues like the one I faced where documentation is not guiding a lot, and it is time consuming to find the root cause or fix compared to the earning.

But this is another subject, to come back to my original issue, I have scanned the drive for any errors and bad sectors but the disk is clean.

I suspect this is a H/W problem with the SATA connector on the external case that is getting too hot after 24h of constant use and thus making access to the drive unavailable (thus the Access denied error).

I am not totally sure, but that’s so far the only explanation. Unfortunately, when this issue occur the node is offline, data are corrupted with no way to recover anything.

Maybe that will help someone in the future.