I did stop the node before running @anon27637763ās script.
The corrupted database file was used_serial.db.
Restoring it (command sqlite3 used_serial.db ".read dump_all_notrans.db") took ages.
I initially had started the process on the HDD, but because it was so slow I switched to the SD card which did go faster, but it still took roughly 1 hour to recreate this 25MB file!
Thatās frightening. Hope it did not kill my SD cardā¦
Iām not sure all of that is normal, but the node is back online and for the moment, it seems like itās workingā¦
I still think such attempts at fixing a database should be handled by the node itself because apparently that happens regularly with sqlite databases. It could check all that after a version update for instance. Or on a regular basis.
In my case, I had a power outage a few days ago, could it be the root cause of this malformed database? If so, surely it will happen time to time to many SNOs in the future.
Just my 2 cts
No. No, absolutely not ā unless the storagenode software is doing something incredibly unsafe with the database, such as using it without a journal.
Both rollback and WAL journal modes should 100% protect the database from becoming malformed even in the event of a sudden power cut.
@Alexey, if you believe that this was caused by a power outage, then you should report a critical bug to the developers (either storagenode or SQLite). This should not happen.
I can find no such quote in their documentation. The closest thing hints that drives that do not honor a sync request may lose/corrupt data because they report success before the data is actually persisted (which is a storage-level bug/misfeature ā and the filesystem driver or anything else that wants to ensure writes actually happen would suffer the same problem). So ā100%ā in my post may be a bit hyperbolic⦠but these issues seem to crop up an awful lot. More than Iāve ever seen before with SQLite, which makes me believe there is a software bug somewhere.
If a filesystem causes such corruption then it would be a filesystem-level bug.
Since SQLite databases are ordinary disk files, any malfunction in the filesystem can corrupt the database. Filesystems in modern operating systems are very reliable, but errors do still occur. For example, on 2013-10-01 the SQLite database that holds the Wiki for Tcl/Tk went corrupt a few days after the host computer was moved to a dodgy build of the (linux) kernel that had issues in the filesystem layer. In that event, the filesystem eventually became so badly corrupted that the machine was unusable, but the earliest symptom of trouble was the corrupted SQLite database.
If a machine loses power, the filesystem can become broken⦠or a number of filesystem issues may cause the open databases to have errors.
But read the whole pageā¦
If you want to test this out, create a new node ID and run it for a few weeks to get some data and then furiously unplug the machine plug it back in⦠iterate until one or more sqlite3 DBs become corrupt and report back your results.
I mean, if the Linux filesystem drivers are that unreliable, why are companies running mission-critical production databases on Linux filesystems? Is this sensitivity to power cuts somehow unique to SQLite and not full-blown RDBMSes?
This issue is not related to the Storj node software⦠and thus is very unlikely to be fixed by Storj developers, since it lies outside the scope of the software.
Perhaps a different database would be preferable. However, my understanding is that at least some of the data contained in the databases are being moved into the data directory structure. So, the choices seem to be:
Develop a database agnostic go code base ourselves and issue a pull request on githubā¦
Deal with the current situation as best as we can⦠I personally have had no show stoppers with DB.
Move the DBs off the data drive and see if that improves the situation.