Error creating tables for master database on storagenode: migrate: duplicate column name:

Pernicio · March 12, 2021, 7:30pm

Hi,

I tried to move the databases on another location using the config file and moving the databases manually to the different drive. Of course, it didn’t work like that and I forgot to make backups for the database files, so I tried to move everything back to their original folder.

I ran all the database fixing steps from here; https://support.storj.io/hc/en-us/articles/360029309111-How-to-fix-a-database-disk-image-is-malformed-

After that there is the following problem still persisting;

2021-03-12T21:50:22.651+0200 FATAL Unrecoverable error {"error": "Error creating tables for master database on storagenode: migrate: duplicate column name: trash\n\tstorj.io/storj/private/migrate.SQL.Run:292\n\tstorj.io/storj/private/migrate.(*Migration).Run.func1:197\n\tstorj.io/storj/private/dbutil/txutil.withTxOnce:75\n\tstorj.io/storj/private/dbutil/txutil.WithTx:36\n\tstorj.io/storj/private/migrate.(*Migration).Run:196\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).MigrateToLatest:346\n\tmain.cmdRun:193\n\tstorj.io/private/process.cleanup.func1.4:363\n\tstorj.io/private/process.cleanup.func1:381\n\tgithub.com/spf13/cobra.(*Command).execute:842\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:950\n\tgithub.com/spf13/cobra.(*Command).Execute:887\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:64\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57", "errorVerbose": "Error creating tables for master database on storagenode: migrate: duplicate column name: trash\n\tstorj.io/storj/private/migrate.SQL.Run:292\n\tstorj.io/storj/private/migrate.(*Migration).Run.func1:197\n\tstorj.io/storj/private/dbutil/txutil.withTxOnce:75\n\tstorj.io/storj/private/dbutil/txutil.WithTx:36\n\tstorj.io/storj/private/migrate.(*Migration).Run:196\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).MigrateToLatest:346\n\tmain.cmdRun:193\n\tstorj.io/private/process.cleanup.func1.4:363\n\tstorj.io/private/process.cleanup.func1:381\n\tgithub.com/spf13/cobra.(*Command).execute:842\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:950\n\tgithub.com/spf13/cobra.(*Command).Execute:887\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:64\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n\tmain.cmdRun:195\n\tstorj.io/private/process.cleanup.func1.4:363\n\tstorj.io/private/process.cleanup.func1:381\n\tgithub.com/spf13/cobra.(*Command).execute:842\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:950\n\tgithub.com/spf13/cobra.(*Command).Execute:887\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:64\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

Thank you for your assistance.

SGC · March 12, 2021, 8:54pm

not that i can really read those errors, but it sounds like your database is still messed up.

sounds to me like it has an extra column or something… i duno…

this looks useful… tho… it’s the database… not really something one should tinker with…
it’s a bit like doing brain surgery with a spoon… i’m sure it could work if one is qualified… but if not… well…

so you most likely want somebody that knows what they are doing.
like @Alexey

Pernicio · March 12, 2021, 9:31pm

Thank you for the link, I think it is faster to just delete all the databases. Do you have any idea if it really just rebuilds the DB’s? Or should the original stored files be deleted along with them?

I took backups for the corrupted DB files so it might be possible to return at some point, if it would be more reasonable for the network.

Pac · March 12, 2021, 10:29pm

Hello @Pernicio and welcome

A node can be recovered even if all databases were to be lost, but on the other hand you should not delete any stored data files: this could lead to disqualification.

If only some db files are lost, here is how they can be recreated:

In last resort if everything else fails, deleting all database files should work, but you would lose all statistics for current month, and possibly some of your earnings for current month also, but at least the node would be “saved”.

Pernicio · March 13, 2021, 9:28pm

Thank you for your assistance in the matter.

Now that I have run the node with the clear DB’s (deleted them all and let the storagenode create new ones) my Dashboard shows that I have almost all of the 6 Terabytes free, even though I didn’t delete the strorage folders - does this fill up back to normal only once all the pieces have been deleted and replaced by the users of the Tardigrade service, which could take several months?

I’m also getting these errors (177 in total, within 24 hours), can they be just ignored after a complete database sweep?

2021-03-13T22:39:45.565+0200	ERROR	piecestore	download failed	{"Piece ID": "ECKNTDIA6RSZANKRMVDPVHAHDN7B4OEISFXBEQEIFHDKKJQYJNSA", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET", "error": "write tcp 192.168.1.101:28967->135.181.29.51:59644: wsasend: An existing connection was forcibly closed by the remote host.", "errorVerbose": "write tcp 192.168.1.101:28967->135.181.29.51:59644: wsasend: An existing connection was forcibly closed by the remote host.\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:228\n\tstorj.io/drpc/drpcwire.SplitN:29\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:276\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:322\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1118\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func5.1:580\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22"}

2021-03-13T22:42:18.299+0200	ERROR	piecestore	upload failed	{"Piece ID": "JDZQ56JB5DVW4RBL7HTXJUHGTZJIIR2G7QSC3RHJD2UP5JXUJAKQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "error": "unexpected EOF", "errorVerbose": "unexpected EOF\n\tstorj.io/common/rpc/rpcstatus.Error:82\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:325\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:1025\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:29\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51", "Size": 532480}

2021-03-13T01:26:30.303+0200	ERROR	blobscache	satPiecesContentSize < 0	{"satPiecesContentSize": -2307840}

Thank you for your time in trying to help me be wiser on these issues

Edit: Just after writing this message and refreshing the Dashboard one final time, the free space indicator shows a correct value on how much space is in use, so it took about 24 hours to correct it self.

Pac · March 13, 2021, 10:38pm

The thing is, when you start the node it browses all stored files to refresh its databases and to know more or less precisely how much space is taken on your drive. In your case I assume it had no idea how much space was taken before browsing it all, because it had no previous data (as you had no more databases). This happens every time a node starts (or gets updated), whether or not your databases were lost or not, but it’s usually not a problem because while data is being browsed, the dashboard can display the previous state of the node, saved in the databases.
This browsing process can take a long time depending on the type of drive (CMR, SMR…), how much data it’s holding, its performance, etc. When I say a long time, my slowest node takes almost 30 hours to browse all of the 2TB of data it’s storing…

Having upload and download errors is not necessarily a problem because some of them fail “naturally” when you lose races against some other nodes that replied faster than you. In which case the download or upload is cancelled.

I’m not sure about the third one though
@kevink saw the same thing there:

No definitive answer so far on what this error is about.

If these happen only time to time, if your node is getting ingress & egress and if your scores look and stay good (>95%), I wouldn’t worry too much about these.

In doubt, it doesn’t harm to run a full check on the disk to be sure it has no faulty sectors.