What happens if I would remove databases

Hi guys I just got a lovely error today:
2022-09-08T10:09:35.950+0200 FATAL Unrecoverable error {“error”: “Error starting master database on storagenode: database: piece_expiration opening file "D:\\piece_expiration.db" failed: disk I/O error: The file or directory is corrupted and unreadable.\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:324\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openExistingDatabase:306\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openDatabases:281\n\tstorj.io/storj/storagenode/storagenodedb.OpenExisting:248\n\tmain.cmdRun:193\n\tstorj.io/private/process.cleanup.func1.4:378\n\tstorj.io/private/process.cleanup.func1:396\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfigAndLogger:93\n\tstorj.io/private/process.ExecWithCustomConfig:75\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:61\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”, “errorVerbose”: “Error starting master database on storagenode: database: piece_expiration opening file "D:\\piece_expiration.db" failed: disk I/O error: The file or directory is corrupted and unreadable.\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:324\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openExistingDatabase:306\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openDatabases:281\n\tstorj.io/storj/storagenode/storagenodedb.OpenExisting:248\n\tmain.cmdRun:193\n\tstorj.io/private/process.cleanup.func1.4:378\n\tstorj.io/private/process.cleanup.func1:396\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfigAndLogger:93\n\tstorj.io/private/process.ExecWithCustomConfig:75\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:61\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n\tmain.cmdRun:195\n\tstorj.io/private/process.cleanup.func1.4:378\n\tstorj.io/private/process.cleanup.func1:396\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfigAndLogger:93\n\tstorj.io/private/process.ExecWithCustomConfig:75\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:61\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}

The VM crashed today and was unreachable via RDP and via Virtual monitor. so I had to hard power off, after power on this error came up.

Im wondering as after deleting the Databases, storj recreates them. if I delete them and the are remade will the node function or will it get DQ, do the databases hold sth Important?

Thanks

You can do that eventually, but try fixing the file system issue first. Stop the node and run chkdsk on the HDD.

3 Likes

You can also follow this article to recreate only corrupted database (if that’s the case): https://support.storj.io/hc/en-us/articles/4403032417044-How-to-fix-database-file-is-not-a-database-error , but better to fix the filesystem first.

1 Like

I have never been DQ for deleting the db, wal and shm files and I have done it several times due powercuts and failing disks

Several times? What kind of setup did that to you? I’ve got several nodes for 2 years and haven’t lost a single database file yet. Had some power cuts in the meantime.

2 Likes

So my question now that I’m waiting for the disk to get scanned, what do the databases hold. I never investigated that it is more just a curiosity.

They mostly hold data for reporting on the dashboard. Things like bandwidth used, storage used and payout info. They used to hold piece metadata, but that has been moved to the pieces themselves. So it’s all non-critical right now. I believe the only remaining db with a function to the operation of the node is the piece expiration db, which holds expiration dates for pieces. Losing that means your node doesn’t immediately remove expired pieces. However, those pieces will eventually be cleaned up by garbage collection as well.

So you’ll lose stats, mostly historic ones. But information like held amount might never be correct again, because the node is missing that information for previous months. It’s not ideal to lose the info, but it won’t break your node if you start over with clean db’s. I recommend following the info on the page @Alexey linked as that instructs you on how to recover only the corrupted db’s, limiting loss of information.

2 Likes

Ok.
The first time was moving from PC to pi4 (I think). I forgot the --delete step of the rsync and the WALs corrupted the dbs. Donald helped me out there.

Last time was when I ran out of inodes due to using a disk from chia, that I formatted as largefile4. It complained about the orders db so I deleted that, then all the dbs then all the files in orders. Then it worked again for a while, rinse and repeat until I managed to move to a brand new 18tb. B.S. helped out there.

In between those I can’t remember specifics but I had an NVM ssd that did weird short writes where you could read the whole file but not each block individually. That was weird. That node got moved to a wd elements and spent a few months almost DQ but survived.

Deleting the dbs does not cause DQ

1 Like

Hey guys need a bit more help

im getting this fatal error

2022-09-14T17:46:37.636+0200 FATAL Unrecoverable error {“error”: “piecestore monitor: disk space requirement not met”, “errorVerbose”: “piecestore monitor: disk space requirement not met\n\tstorj.io/storj/storagenode/monitor.(*Service).Run:125\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}

I dont get it the drive is 5tb in size

all help is warmly welcomed

Thanks

Small update the node started after a second start i am getting these errors now, should I be worried?

2022-09-14T17:50:25.634+0200 ERROR collector unable to delete piece {“Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Piece ID”: “CUKWIXQUNUEQYQSF7AJJ6GDML5FEN4X5CRDXJKLR5DQVUCLNX2SQ”, “error”: “pieces error: filestore error: file does not exist”, “errorVerbose”: “pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:245\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:226\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}

Thanks

if the pieced_spaced_used.db is removed or recreated, the node isn’t aware of data already used by the node anymore. If there is less than 500gb free space on the disk, you see this error as a result.

This should resolve itself after the filewalker runs once. But if not you can change the minimum requirement in the config.yaml. The setting is storage2.monitor.minimum-disk-space

Seeing these too lately. I’m not sure why, but they appear on all my nodes now. As long as it’s the collector it’s not a problem. The piece probably got deleted the normal way before the collector got to it. Still annoying though.

1 Like

Thanks for the quick response much apprichiated

I will keep monitoring this node for a few hours to see any new changes

I have same question. My node almost 1 year old, and I find that in logs appear error:

ERROR	bandwidth	Could not rollup bandwidth usage	{"Process": "storagenode", "error": "bandwidthdb: database disk image is malformed"

How long it lasts I don’t know. I tryed to recover it with manual linked above but with no success. Can I just delete faulty file? What will happens after that? Will my node be suspended or disqualified?

OK so just to summarize, the good news first I suppose. The DB`s weren’t actually courted. YEY

So what caused it?

Well the virtual drive crashed because the VM crashed it seems. I tried to remove all the DB`s but one temp file was stuck open. mote it was like 500Mb so not the smallest one. After running the disk scan utility in windows as recommended by @BrightSilence found and corrected a few errors. This also got rid off the stuck temp file. After the temp file was removed the Node would respond to start signal but would still fail.

At this point I was kind of disappointed and was frustrated so I don’t fully remember what I did, but I kind of returned al the DB`s back and after few back and for, the node started with all data working and even the web GUI loaded with all valid data. Thanks again to all who helped me out. This was one of my oldest nodes 3rd one wouldn’t want to loose it.

Thanks

1 Like

That’s good news, glad you got it working again!

1 Like

We have a fix for this in the works.

https://review.dev.storj.io/c/storj/storj/+/8394

6 Likes