"satellites" table missing on the database after unexpected shutdown

FoRTu · August 4, 2020, 9:04am

I thing your right, somthing significant has happend to the databse. Another schema error:

2020-08-04T09:00:29.204Z INFO Telemetry enabled
2020-08-04T09:00:29.210Z INFO db.migration Database Version {"version": 43}
Error: Error during preflight check for storagenode databases: storage node preflight database error: storage_usage: expected schema does not match actual:   &dbschema.Schema{
-  Tables: []*dbschema.Table{
-  s"Name: storage_usage\nColumns:\n\tName: at_rest_total\n\tType: REAL\n\tNullable: false\n\tDefault: \"\"\n\tReference: nil\n\tName: interval_start\n\tType: TIMESTAMP\n\tNullable: false\n\tDefault: \"\"\n\tReference: nil\n\tName: satellite_id\n\tType: BLOB\n\tNullable: false\n\tDefault: \"\"\n\tReference: nil\nPrimaryKey: interval_start satellite_id\nUniques:\n\t",
-  },
+  Tables: nil,
   Indexes: nil,
  }

storj.io/storj/storagenode/storagenodedb.(*DB).Preflight:346
main.cmdRun:193
storj.io/private/process.cleanup.func1.4:353
storj.io/private/process.cleanup.func1:371
github.com/spf13/cobra.(*Command).execute:840
github.com/spf13/cobra.(*Command).ExecuteC:945
github.com/spf13/cobra.(*Command).Execute:885
storj.io/private/process.ExecWithCustomConfig:88
storj.io/private/process.ExecCustomDebug:70
main.main:323
runtime.main:203

BrightSilence · August 4, 2020, 9:09am

The schemas I use are from v1.9.5, are you already on that version? It could help to update if not.

FoRTu · August 4, 2020, 9:29am

I thinks I’m in the same version because I download the latest Docker image from the official Docker Hub repository yesterday. There is any way to know exactly what version im running if do it with a docker container?

BrightSilence · August 4, 2020, 9:33am

the docker image has been updated, so running docker pull storjlabs/storagenode:latest would do it. Make sure your run command also uses the :latest tag.

FoRTu · August 4, 2020, 9:52am

I have remove the container and the image. Then I pull the tagged “latest” image (Image ID: 18b91207b1ab) using the command you type. After that I run the container and get the same schema error.

BrightSilence · August 4, 2020, 10:31am

oh whoops, I overlooked that it’s actually a different schema. Let’s just keep going then I guess.

Rename storage_usage.db

sqlite3 storage_usage.db

CREATE TABLE storage_usage (
                                                satellite_id BLOB NOT NULL,
                                                at_rest_total REAL NOT NULL,
                                                interval_start TIMESTAMP NOT NULL,
                                                PRIMARY KEY (satellite_id, interval_start)
                                        );

.quit

BrightSilence · August 4, 2020, 10:46am

In the mean time, I just want to double check. Is it possible that you changed the path to storage and perhaps created new empty db files? It seems every db file is broken. That is highly unlikely to happen on a sudden shut down as a lot of these db’s are very infrequently written to.

I’m thinking you may have created empty db files by trying to do a consistency check on them. With that in mind, you should probably check all paths and have a look at file sizes as well as the size of the corresponding blobs folder. If we start the node with a wrong path or no data after finally recreating all db’s, your node will be disqualified within minutes. So lets make sure to check that first.

FoRTu · August 4, 2020, 11:43am

I checked the paths and they are fine. Blobs folder has the size expected, so I try something.

I have moved all the db files to another temporary folder, and then run the node without them. This is what I get:

2020-08-04T11:05:26.567Z INFO Telemetry enabled
2020-08-04T11:05:27.086Z INFO db.migration Database Created {"version": 43}
2020-08-04T11:05:27.662Z INFO preflight:localtime start checking local system clock with trusted satellites' system clock.
2020-08-04T11:05:28.547Z INFO preflight:localtime local system clock is in sync with trusted satellites' system clock.
2020-08-04T11:05:28.547Z INFO bandwidth Performing bandwidth usage rollups
2020-08-04T11:05:28.547Z INFO trust Scheduling next refresh {"after": "6h56m50.354746389s"}
2020-08-04T11:05:28.548Z INFO Node 1v7Zw-------------------------------------------eqY1 started
2020-08-04T11:05:28.548Z INFO Public server started on [::]:28967
2020-08-04T11:05:28.548Z INFO Private server started on 127.0.0.1:7778
2020-08-04T11:05:29.528Z INFO piecestore download started {"Piece ID": "QYEXVYV4OZOHDQBFCL3EV2Z2AAHUNRLUAHFIOCPB4VLRUEY45YPA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET"}
2020-08-04T11:05:30.019Z INFO piecestore downloaded {"Piece ID": "QYEXVYV4OZOHDQBFCL3EV2Z2AAHUNRLUAHFIOCPB4VLRUEY45YPA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET"}
2020-08-04T11:05:30.019Z INFO piecestore downloaded {"Piece ID": "QYEXVYV4OZOHDQBFCL3EV2Z2AAHUNRLUAHFIOCPB4VLRUEY45YPA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET"}
2020-08-04T11:05:32.251Z INFO piecestore download started {"Piece ID": "3Z5IOFWZYM2XLDGZGQW2CFV43UFYDN22XQZ3S63JSYXHE5PDAGQA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET"}
2020-08-04T11:05:32.418Z INFO piecestore downloaded {"Piece ID": "3Z5IOFWZYM2XLDGZGQW2CFV43UFYDN22XQZ3S63JSYXHE5PDAGQA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET"}
2020-08-04T11:05:34.701Z INFO piecestore download started {"Piece ID": "YATRQLVNS25AWAQVHUIAT7JPRMJIMBKBZW6GXTINXBSAHSA7MVNQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET"}
2020-08-04T11:05:34.936Z INFO piecestore download started {"Piece ID": "OOOZ7ECRKBKXSPHZNUIUOGJ7I3QAWOPLNAR4H6VAXOOZ6TOR5IHA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET"}
2020-08-04T11:05:36.158Z INFO piecestore download started {"Piece ID": "F4BD3ZNYC67TZIR3GTOEUBTRFZIARS4PZ5NNPP4KKTN6MVMH7EQQ", "Satellite ID":
.....

After some seconds I stopped the node fearing about be disqualified. I check the folders again and the db files were regenerated. What conclusion can get from this? The node could regenerate the database files and run normally? On the other hand, after some internal checks the node will be disqualified or not?

BrightSilence · August 4, 2020, 11:48am

This may actually be your simplest solution. You can have a look at the original db files that you backed up and the ones you hadn’t yet changed. I’m guessing they are all very small.

We were basically one by one doing the same thing the node now did automatically. When there are no db files, the node recreates them as empty. You’d be missing all your old stats so the dashboard will likely show wrong data. But we were kind of heading that way anyway.

The upside is that the db data is all non-essential. The most important part is that the actual customer data is still there. And from the successful downloads in the logs you showed, they are.

So we could continue going one by one, but it seems we would effectively end up with empty new databases one way or the other. You might as well keep it running with the new automatically created databases.

FoRTu · August 4, 2020, 12:52pm

The node is running and everything is all right after be watching logs. There is no errors so I am happy of it. The statistics data has been lost and do not show it on dashboard but it does not matter.

I will have an eye on it for 24 hours and if everything is right y will close the topic as solved.

Thank you @BrightSilence and @alexey for your time. I appreciate your job on this community. This would not be the same without people like you.

Thank you very much.

madest13 · August 4, 2020, 1:42pm

hi all
I have the same problem after auto update. what can be done?

2020-08-04T13:37:49.950Z INFO Telemetry enabled
2020-08-04T13:37:49.957Z INFO db.migration Database Version {“version”: 43}
Error: Error during preflight check for storagenode databases: storage node preflight database error: orders: expected schema does not match actual: &dbschema.Schema{
Tables: *dbschema.Table{&{Name: “order_archive_”, Columns: *dbschema.Column{&{Name: “archived_at”, Type: “TIMESTAMP”}, &{Name: “order_limit_serialized”, Type: “BLOB”}, &{Name: “order_serialized”, Type: “BLOB”}, &{Name: “satellite_id”, Type: “BLOB”}, &{Name: “serial_number”, Type: “BLOB”}, &{Name: “status”, Type: “INTEGER”}, &{Name: “uplink_cert_id”, Type: “INTEGER”, Reference: &dbschema.Reference{Table: “certificate”, Column: “cert_id”}}}}, &{Name: “unsent_order”, Columns: *dbschema.Column{&{Name: “order_limit_expiration”, Type: “TIMESTAMP”}, &{Name: “order_limit_serialized”, Type: “BLOB”}, &{Name: “order_serialized”, Type: “BLOB”}, &{Name: “satellite_id”, Type: “BLOB”}, &{Name: “serial_number”, Type: “BLOB”}, &{Name: “uplink_cert_id”, Type: “INTEGER”, Reference: &dbschema.Reference{Table: “certificate”, Column: “cert_id”}}}}},
Indexes: *dbschema.Index{
&{Name: “idx_order_archived_at”, Table: “order_archive_”, Columns: string{“archived_at”}},
&{
Name: “idx_orders”,
Table: “unsent_order”,
Columns: string{“satellite_id”, “serial_number”},

```
  	Unique:  true,
```

  	Unique:  false,
  	Partial: "",
  },

},
}

BrightSilence · August 4, 2020, 1:47pm

Lets investigate first instead of going down that rabbit hole again.

Could you show the result of:

sqlite3 /path/to/orders.db .schema

madest13 · August 4, 2020, 2:01pm

I have already defeated the problem, thanks)

BrightSilence · August 4, 2020, 2:35pm

Could you please share how so others may use the same method?

Just to be sure, I don’t recommend just starting over with new db files unless you have to. I don’t want to give people the idea that that is usually a good solution.

BrightSilence · August 5, 2020, 11:33am

Hey @FoRTu,

I understand that this post was indeed the solution for your situation, but if you mark this one as the solution could you add something to the post to caution people that this is only a last resort if the data from the databases is already lost? In most cases it’s much better to fix the specific db file that’s having an issue and I wouldn’t want other SNOs just throwing away their db’s.

FoRTu · August 5, 2020, 1:23pm

You are right @BrightSilence.

In my case the solution was to stop the node and remove all .db files from storage folder. Then I started again the node and it regenerate all the .db files and start working fine like before.

The downside of remove the .db files is that statistics and Dashboard information was lost. It is preferable to restore the .db files one by one, as @BrightSilence said, instead of remove the .db files.

This worked in my case, not all issues with the .db files are the same. Use this topic as usefull information not a definitive solution, and before following the steps I followed, explane your case to the community.