Hello everybody. I would like to give everyone who migrates their nodes the following tip:
Manually back up your databases to make sure you are backed up or skip step 8 until your node runs again.
Step 8 in the documentation deletes all data from the old hard drive.
I only delete when everything is running again so I skipped this step.
After running rsync several times, I started the new node and my log was full of:
"failed to add bandwidth usage {“error”: “bandwidthdb error: database disk image is malformed”, “errorVerbose”: “bandwidthdb error: database disk image is malformed”
I then stopped the node immediately and deleted the databases, then copied from my manual backup and the node runs again wonderfully.
I hope that I will not be disqualified or suspended due to the 25 errors in the log.
I ran rsync several times, then turned off the node and ran rsync two more times.
So don’t rely on rsync alone and back up the databases manually.
I think that has saved me a lot of work and time
2 Likes
This is not true. It deletes files from the DESTINATION that should no longer be there. The old drive is not touched.
- Run the copying command with a
--delete
parameter to remove deleted files from the destination:
This is because you skipped step 8. You need to run rsync once with the node stopped. If you don’t you’re running on a copy of the database from when it was in live use. If you do, but you don’t use the --delete you’re running with the latest db files, but outdated shm and wal files, which can cause all kinds of other issues. You may also be missing pieces now. It’s not a good idea to deviate from instructions. They’ve been well thought through.
3 Likes
Ehh yes you are right. My mistake I was a bit stressed yesterday.
Nevertheless: I ran rysnc several times, then two more times when the node was switched off.
That means everything was copied by rsync, including the databases. Despite that, I got the errors in the log when I started the node. After I switched off the node and copied the databases manually, the node ran. I can only tell you how it was and I would also like to spare others the repair of their databases and the problem seems to often arise when migrating. (see various posts here)
Regardless if I make a manual backup of the database when the node is switched off and play this backup back before I switch the node on again, they are also in the same state.
So something must have gone wrong when copying with rsync or why do the databases run when I copy them manually? Again: the node has been off since the database was backed up, i.e. rysnc copied the same databases that I copied manually.
1 Like
Yes, as mentioned in my previous post. While running the node there are .db-wal and .db-shm files for each database. These contain new data for databases in use. But when you stop the node those are merged into the .db files. Because you never ran with the --delete option, those old temporary files stuck around even when syncing the new databases while the node was offline. This is likely what corrupted your databases.
After this, those files were likely cleaned up automatically and merged into the .db files… which is not really a great idea since they didn’t contain any data that still had to be merged. But it also made the problem much less visible to you. Replacing the now corrupt .db files with the old correct backups obviously would fix this issue.
The same .db files, yes, but earlier rsync runs left behind the .db-wal and .db-shm files that messed up your databases.
I have no doubt about your intentions in wanting to help others prevent this and I appreciate your effort. But skipping step 8 is harmful and your issue was likely self inflicted by skipping that step. So despite your good intentions, it is unfortunately not good advise.
That said, I agree that you should keep the old data until the entire process is done. But the instructions don’t contradict that and data is not removed at any step before you finish if you follow them to the letter.
3 Likes