grate: v58: database disk image is malformed\n\tstorj.io/storj/private/migrate.(*Migration).Run:212\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).MigrateToLatest:425\n\tmain.cmdRun:100\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:271", "errorVerbose": "Error migrating tables for database on storagenode: migrate: v58: database disk image is malformed\n\tstorj.io/storj/private/migrate.(*Migration).Run:212\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).MigrateToLatest:425\n\tmain.cmdRun:100\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:271\n\tmain.cmdRun:102\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:271"}
Error: Error migrating tables for database on storagenode: migrate: v58: database disk image is malformed
storj.io/storj/private/migrate.(*Migration).Run:212
storj.io/storj/storagenode/storagenodedb.(*DB).MigrateToLatest:425
main.cmdRun:100
main.newRunCmd.func1:33
storj.io/common/process.cleanup.func1.4:393
storj.io/common/process.cleanup.func1:411
github.com/spf13/cobra.(*Command).execute:983
github.com/spf13/cobra.(*Command).ExecuteC:1115
github.com/spf13/cobra.(*Command).Execute:1039
storj.io/common/process.ExecWithCustomOptions:112
main.main:34
runtime.main:271
2024-07-23 06:00:31,636 INFO stopped: storagenode (exit status 1)
Seem to have an error on DB this morning - has this been common issue ??
I wonder if this is a feature/bug of the new version
My little script , might help you
Thanks for adding to this thread - hopefully will get more awareness of the issue occuring.
I have a few nodes, at least 4 are on 1.108.3 - This was the first one to have an issue⦠So I donāt think its with EVERY node⦠but it certainly might be with a node that has a slightly slower disk⦠maybeā¦
What was the outcome of your fix? I would assume the .db was corrupted prior to upgrade task. You sure you didnāt cut it off thinking it was unresponsive. Did it run out of mem and was killed off once before that logging?
People are best informed that migration can take a while on slow fs, patience needed.
Anyways, sounds like prior corruption then triggered by upgrade; however, was your prior version 1.104.5 or 1.105.4? Cuz I think each version did a db update latelyā¦which would cause corruption going from 1.104.5 --? 1.108.3 or maybe even the short lived 1.107.x.
Kewl that you scripted that & shared to the community, thatās great!
Iām sorry, I thought that you are upgraded the node manually for example to 1.109.x and now itās downgraded automatically to the rolling version of 1.108.x
Itās very often happen if the version contains a databases migration and the node is downgradedā¦
same problem today , after restarting my server after some windows update
all nodes thrown error
2024-07-25T16:25:51Z ERROR failure during run {āProcessā: āstoragenodeā, āerrorā: āError checking version for storagenode database: validate db version mismatch: expected 26, but current version is 60\n\tstorj.io/storj/private/migrate.(*Migration).ValidateVersions:141\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).CheckVersion:671\n\tmain.cmdRun:103\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:267ā, āerrorVerboseā: āError checking version for storagenode database: validate db version mismatch: expected 26, but current version is 60\n\tstorj.io/storj/private/migrate.(*Migration).ValidateVersions:141\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).CheckVersion:671\n\tmain.cmdRun:103\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:267\n\tmain.cmdRun:105\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:267ā}
2024-07-25T16:25:51Z FATAL Unrecoverable error {āProcessā: āstoragenodeā, āerrorā: āError checking version for storagenode database: validate db version mismatch: expected 26, but current version is 60\n\tstorj.io/storj/private/migrate.(*Migration).ValidateVersions:141\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).CheckVersion:671\n\tmain.cmdRun:103\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:267ā, āerrorVerboseā: āError checking version for storagenode database: validate db version mismatch: expected 26, but current version is 60\n\tstorj.io/storj/private/migrate.(*Migration).ValidateVersions:141\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).CheckVersion:671\n\tmain.cmdRun:103\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:267\n\tmain.cmdRun:105\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:267ā}
did not play with beta version or anything at all before that , had 200h to 400h uptime on nodes before reboot
fixed by making a database backup , delete all database , launch node to make new database , stop node , restore backup , then replacing ONLY piece_expiration.db with the new empty version created earlier
i only fucked one node where i lost my backup , now the dashboard is telling me drive is empty , what can i do ?
Just restart the node with the enabled scan on startup (itās enabled by default), eventually it would fill databases with the actual usage, but may take days. Make sure that you do not have errors related to a filewalkers or the databases.
Unfortunately the
will not be regenerated, so the TTL data will not be removed when itās expired, but should be collected by a Garbage Collector after a week and after another week would be removed permanently.
Yes Iām wondering if this has affected nodes which donāt have SUPER QUICK drives⦠- Although my thought was it affects DBs , and if they are SSD⦠thats more than fast enough⦠even if its a USB stick.
My script to backup / repair the database probably would of fixed it here - The script i created fixed my piece_expiration⦠but i assume it would of cut off the last few rows which had an issue / incomplete dataset.
@Alexey - Is this worthy of notification to devs to look into this āpotentialā bigger issue as this gets rolled out further
They are aware of some issues. This one is reported as well. If it could be reproduced without any interventions like upgrade to the beta and automatically rolled back, then it is a potential bug. Either in the rollout process or in the updater.
If you mean the databases migration incompatibility, then I believe we need to keep this check.
So i too woke up today to the FATAL v58 db migrate fail in one node.
Windows GUI it is.
I begin to repair the db manually according to the official guide ( How to fix a ādatabase disk image is malformedā )
so the only db with not āokā with PRAGMA check was āpiece_expiration.dbā
file is ~500MB,
dump_all.sql is ~130MB
dump_all_notrans.sql is eaxctly same weight ~130MB
after finishing step 13
piece_expiration.db is like 154MB
and put inback in storagenode, node gives new error this time v26 somethign about duplicated table in some trash, migration
What ever, because, at first how i am suppose to repair a .db,
if the node constantly trying to run itself every 2-3 minutes, despite i ordered it to be stopped? also updater i stopped but the node lives its own life! (producing error v26 every so minutes)
so whats now
here :
2024-07-26T12:46:22+01:00 INFO Anonymized tracing enabled
2024-07-26T12:46:22+01:00 INFO Operator email {"Address": "******@****"}
2024-07-26T12:46:22+01:00 INFO Operator wallet {"Address": "0x3*******************"}
2024-07-26T12:46:23+01:00 INFO Telemetry enabled {"instance ID": "**************"}
2024-07-26T12:46:23+01:00 INFO Event collection enabled {"instance ID": "***********************"}
2024-07-26T12:46:23+01:00 INFO db.migration.26 Add Trash column to pieceExpirationDB
2024-07-26T12:46:23+01:00 ERROR failure during run {"error": "Error migrating tables for database on storagenode: migrate: v26: duplicate column name: trash\n\tstorj.io/storj/private/migrate.(*Migration).Run:212\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).MigrateToLatest:425\n\tmain.cmdRun:100\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCustomConfigAndLogger:77\n\tstorj.io/common/process.ExecWithCustomConfig:72\n\tstorj.io/common/process.Exec:62\n\tmain.(*service).Execute.func1:107\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78", "errorVerbose": "Error migrating tables for database on storagenode: migrate: v26: duplicate column name: trash\n\tstorj.io/storj/private/migrate.(*Migration).Run:212\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).MigrateToLatest:425\n\tmain.cmdRun:100\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCustomConfigAndLogger:77\n\tstorj.io/common/process.ExecWithCustomConfig:72\n\tstorj.io/common/process.Exec:62\n\tmain.(*service).Execute.func1:107\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n\tmain.cmdRun:102\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCustomConfigAndLogger:77\n\tstorj.io/common/process.ExecWithCustomConfig:72\n\tstorj.io/common/process.Exec:62\n\tmain.(*service).Execute.func1:107\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-26T12:46:23+01:00 FATAL Unrecoverable error {"error": "Error migrating tables for database on storagenode: migrate: v26: duplicate column name: trash\n\tstorj.io/storj/private/migrate.(*Migration).Run:212\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).MigrateToLatest:425\n\tmain.cmdRun:100\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCustomConfigAndLogger:77\n\tstorj.io/common/process.ExecWithCustomConfig:72\n\tstorj.io/common/process.Exec:62\n\tmain.(*service).Execute.func1:107\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78", "errorVerbose": "Error migrating tables for database on storagenode: migrate: v26: duplicate column name: trash\n\tstorj.io/storj/private/migrate.(*Migration).Run:212\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).MigrateToLatest:425\n\tmain.cmdRun:100\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCustomConfigAndLogger:77\n\tstorj.io/common/process.ExecWithCustomConfig:72\n\tstorj.io/common/process.Exec:62\n\tmain.(*service).Execute.func1:107\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n\tmain.cmdRun:102\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCustomConfigAndLogger:77\n\tstorj.io/common/process.ExecWithCustomConfig:72\n\tstorj.io/common/process.Exec:62\n\tmain.(*service).Execute.func1:107\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-26T12:46:26+01:00 INFO Configuration loaded(...and it just goes in loop every 2 min or so)
(P.S. i have 1 or 2 other Windows nodes working on 1.108.3 from like 29h online i see, seems no problem there, but please donāt roll this 1.108.3, im not gonna run after fixing my nodes like a dog now, its labeled pre-release but still it has been rolled out to my node(allmost all my nodes are on 1.108.3 i checked now and only this 1 a problem so far), while lastest on git hub is still 1.105.4 it shows)
I got stuck at v59 migration. I edited the db by looking at the code for db changes for v59. I then started the node and the migration went successfully. Now I am stuck with collector warnings about file not found.
2024-07-26T12:26:03Z WARN collector unable to delete piece { "Satellite ID": 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE "Piece ID": "DXVIUNDADUW3FNLT6YPUCGUU25CZXQRNDWNAGUOCBKVXAWHFK45Q", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat:122\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:290\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).DeleteWithStorageFormat:270\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteSkipV0:356\n\tstorj.io/storj/storagenode/collector.(*Service).Collect.func1:88\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).GetExpired:71\n\tstorj.io/storj/storagenode/pieces.(*Store).GetExpired:576\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:87\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:65\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/storj/storagenode/collector.(*Service).Run:61\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-26T12:26:03Z WARN collector unable to delete piece { "Satellite ID": 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE "Piece ID": "ZN7SDB2D2DFTQNFEMEPWWGBNV2JG7RSM2WC2LZYRXXOIGRZHDUWA", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat:122\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:290\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).DeleteWithStorageFormat:270\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteSkipV0:356\n\tstorj.io/storj/storagenode/collector.(*Service).Collect.func1:88\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).GetExpired:71\n\tstorj.io/storj/storagenode/pieces.(*Store).GetExpired:576\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:87\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:65\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/storj/storagenode/collector.(*Service).Run:61\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
I checked the log and I can see piece ids repeating for the file not found entries. The log is at info level and increasing at the rate of 150 MB per hour.
As a last resort, I am thinking of replacing piece_expiration.db with a blank one and bear with TTL data for a while.