Database piece_spaced_used does not exist

robhodl · September 13, 2020, 8:50pm

This node has 8TB of data. I moved the node into a new case. Swapped some drives around and now when starting, I get this error:

Error starting master database on storagenode: database: database piece_spaced_used does not exist stat config/storage/piece_spaced_used.db: no such file or directory

Boot log:

storagenode    | 2020-09-13T20:26:18.453Z	INFO	Configuration loaded	{"Location": "/app/config/config.yaml"}
storagenode    | 2020-09-13T20:26:18.467Z	INFO	Operator email	{"Address": "CENSOR"}
storagenode    | 2020-09-13T20:26:18.467Z	INFO	Operator wallet	{"Address": "0xCENSOR"}
storagenode    | Error: Error starting master database on storagenode: database: database piece_spaced_used does not exist stat config/storage/piece_spaced_used.db: no such file or directory
storagenode    | 	storj.io/storj/storagenode/storagenodedb.(*DB).openExistingDatabase:344
storagenode    | 	storj.io/storj/storagenode/storagenodedb.(*DB).openDatabases:313
storagenode    | 	storj.io/storj/storagenode/storagenodedb.Open:245
storagenode    | 	main.cmdRun:151
storagenode    | 	storj.io/private/process.cleanup.func1.4:353
storagenode    | 	storj.io/private/process.cleanup.func1:371
storagenode    | 	github.com/spf13/cobra.(*Command).execute:840
storagenode    | 	github.com/spf13/cobra.(*Command).ExecuteC:945
storagenode    | 	github.com/spf13/cobra.(*Command).Execute:885
storagenode    | 	storj.io/private/process.ExecWithCustomConfig:88
storagenode    | 	storj.io/private/process.ExecCustomDebug:70
storagenode    | 	main.main:330
storagenode    | 	runtime.main:203

contents of the storage directory

bandwidth.db  garbage		 info.db    piece_expiration.db  reputation.db-wal	   temp   used_serial.db-shm
blobs	      heldamount.db-shm  orders.db  pieceinfo.db	 storage-dir-verification  trash  used_serial.db-wal

using linux with docker

nerdatwork · September 14, 2020, 2:18am

This file is missing from your storage.

BrightSilence · September 14, 2020, 8:53am

You can recreate it using sqlite3.

Navigate to the database path, then.

sqlite3 piece_spaced_used.db

Then do

CREATE TABLE versions (version int, commited_at text);
CREATE TABLE "piece_space_used" (
                                                total INTEGER NOT NULL DEFAULT 0,
                                                content_size INTEGER NOT NULL,
                                                satellite_id BLOB
                                        );
CREATE UNIQUE INDEX idx_piece_space_used_satellite_id ON piece_space_used(satellite_id);

Followed by

.quit

jammerdan · September 14, 2020, 9:08am

If there is such an easy solution I don’t understand why the node software cannot handle it without any or at least with less user interventions.

BrightSilence · September 14, 2020, 9:47am

Well that’s not for me to say.

But this really shouldn’t happen in normal operation. Fixing such things can hide underlying problems with the node setup or operation of the node. Which means bigger problems can go unnoticed for longer.

In general it’s probably not a good idea to automatically fix things when the cause is unknown.

robhodl · September 14, 2020, 1:44pm

Thanks, I ran that command to create the database. It seems I’m missing other databases as well.
When I compare the files in storage to whats in my other nodes, it seems I’m missing many databases.

Here’s a list of what’s missing:

piece_space_used.db (fixed with above query)
notifications.db
pricing.db
reputation.db
satellites.db
storage_usage.db

Is it even recoverable at this point?

Maybe there should be a place in the docs where these queries are stored to recreate all the databases.

baker · September 14, 2020, 1:49pm

If you are missing more than one database, my guess is you have input the wrong path with your docker run command. Please verify the contents of the path exactly as written in your docker run command.

robhodl · September 14, 2020, 1:55pm

My path is correct. I’m using docker-compose and its the same file I used since before the upgrade. The drive mount path hasn’t changed either. The only causes of these errors I can think of are possible corruption from improper shutdown, or the node attempted to start on the wrong drive at some point while I was upgrading (the latter is unlikely).

robhodl · September 14, 2020, 2:30pm

For each of the missing databases, I copied the database from another node. Then I opened it in sqlite and deleted data from all the tables.

For each database:
sqlite3> .tables
sqlite3> DELETE FROM table_name;

I thought this would give the same result as creating the databases from scratch. However, now Im getting this error:

Error: Error creating tables for master database on storagenode: migrate: table notifications already exists

robhodl · September 14, 2020, 2:58pm

I then dropped all the tables that it complained about until the node started running again.
sqlite3> DROP TABLE table_name;

As expected, all the information in the dashboard is empty. Can I expect my node’s reputation to suffer, will I be disqualified for having empty databases?

robhodl · September 14, 2020, 3:03pm

Yep, it was disqualified. RIP 8TB of data. Looks like I’m starting fresh. with this one.

BrightSilence · September 14, 2020, 3:06pm

No, that’s a fine approach if the data was missing already. It’s obviously not ideal since you’re definitely missing stats and earnings calculators as well as the dashboard will show incorrect information. some of that will be corrected automatically, like current space usage and reputation information. Which will either be recalculated or retrieved from the satellite. But historic space used and notifications will likely not be recovered.

Luckily all data in your databases are non-essential and you could even delete all databases and your node will still operate just fine. (If there are no databases at all, the node will recreate new empty ones automatically, but this is obviously something you want to avoid)

Edit: I just saw your update that it was disqualified. This is not due to the databases being gone as the node doesn’t need them to succeed audits. Most likely whatever cause half your databases to disappear also impacted the data on your node. So I guess you have some work to do to figure out why data is disappearing.

robhodl · September 14, 2020, 3:09pm

That’s good to know. So far it’s only been disqualified on one satellite. I have many as peers so I’ll just wait and see whats up. The drive is still full with data pieces so they’re not completely gone. I’ll report back if any more satellites disqualify me.

Edit: the node was offline for over 24 hours, maybe that was enough to disqualify it?

BrightSilence · September 14, 2020, 3:10pm

Have a look in your logs for lines with both failed and GET_AUDIT in them. That should tell you why you are getting disqualified.

robhodl · September 14, 2020, 3:26pm

Lots of GET and GET_REPAIR actions. About 1 out of 30 GET actions is failing with file does not exist. The failed actions are happening with many satellites but those same satellites are also getting successful actions. Hopefully, it will sort itself out without too much of a reputation ding.

BrightSilence · September 14, 2020, 3:31pm

Only the GET_AUDIT lines will count towards your reputation, but it looks like you have a lot of missing files. I’m afraid your node will likely not survive that. Probably just a matter of time for the other satellites.

robhodl · October 8, 2020, 5:44pm

So its been about a month, here’s an update. My node was disqualified on 2 satellites because it failed audits. However, the audit is 100% correct on the other 4 satellites.

My question is how should I proceed? Will the 2 satellites I’ve disqualified on eventually take me back? Will they be replaced by other sats? Am I better off killing the node and starting fresh or should I keep it in service for the 4 sats that respect my node?

Thanks!

Alexey · October 10, 2020, 9:05am

No. The DQ is permanent and not reversible.

Maybe we would have a new satellites, but they will not replace the DQed ones, the lasts will remain.

This is up on you. However, you will still be paid for data and traffic of customers of remained satellites if you would leave it running.
Also, you can invoke a Graceful Exit from the remained satellites when your node will be eligible to do so (after 6 months in the network for now, but the initial terms - after 15 months).