"Action": "GET_AUDIT", "error": "usedserialsdb error: database is locked",

Thx for the convenient script @anon27637763 :slight_smile:

Really feels like such procedures should be part of a regular housekeeping routine nodes should do on their own.


One of my nodes returns an error while running one of the commands apparently:

ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
wrong # of entries in index pk_used_serial_

Is this concerning? Should/Cant it be fixed?

Yes. Your database looks like corrupted. Please, fix it:

And I hope that you performed the vacuuming on the stopped node.

1 Like

Thx @Alexey, very helpful as always :slight_smile:

I did stop the node before running @anon27637763ā€™s script.

The corrupted database file was used_serial.db.
Restoring it (command sqlite3 used_serial.db ".read dump_all_notrans.db") took ages.
I initially had started the process on the HDD, but because it was so slow I switched to the SD card which did go faster, but it still took roughly 1 hour to recreate this 25MB file! :scream:
Thatā€™s frightening. Hope it did not kill my SD cardā€¦ :sweat_smile:

Iā€™m not sure all of that is normal, but the node is back online and for the moment, it seems like itā€™s workingā€¦

I still think such attempts at fixing a database should be handled by the node itself because apparently that happens regularly with sqlite databases. It could check all that after a version update for instance. Or on a regular basis.
In my case, I had a power outage a few days ago, could it be the root cause of this malformed database? If so, surely it will happen time to time to many SNOs in the future.
Just my 2 cts :slight_smile:

You can create an idea there: Storage Node feature requests - voting - Storj Community Forum (official)

Most likely yes.

Alright why not.

Here it is: Nodes autorepair :slight_smile:

1 Like

No. No, absolutely not ā€“ unless the storagenode software is doing something incredibly unsafe with the database, such as using it without a journal.

Both rollback and WAL journal modes should 100% protect the database from becoming malformed even in the event of a sudden power cut.

@Alexey, if you believe that this was caused by a power outage, then you should report a critical bug to the developers (either storagenode or SQLite). This should not happen.

SQLite has a statement in their documentation on such errorsā€¦

Pretty much the statement is:

SQLite does its best, but sometimes filesystems mess up database files when the power is shutdown abruptly.

2 Likes

I can find no such quote in their documentation. The closest thing hints that drives that do not honor a sync request may lose/corrupt data because they report success before the data is actually persisted (which is a storage-level bug/misfeature ā€“ and the filesystem driver or anything else that wants to ensure writes actually happen would suffer the same problem). So ā€œ100%ā€ in my post may be a bit hyperbolicā€¦ but these issues seem to crop up an awful lot. More than Iā€™ve ever seen before with SQLite, which makes me believe there is a software bug somewhere.

If a filesystem causes such corruption then it would be a filesystem-level bug.

SQLite: How to Corrupt a Database

6.3. Filesystem Corruption

Since SQLite databases are ordinary disk files, any malfunction in the filesystem can corrupt the database. Filesystems in modern operating systems are very reliable, but errors do still occur. For example, on 2013-10-01 the SQLite database that holds the Wiki for Tcl/Tk went corrupt a few days after the host computer was moved to a dodgy build of the (linux) kernel that had issues in the filesystem layer. In that event, the filesystem eventually became so badly corrupted that the machine was unusable, but the earliest symptom of trouble was the corrupted SQLite database.

If a machine loses power, the filesystem can become brokenā€¦ or a number of filesystem issues may cause the open databases to have errors.

But read the whole pageā€¦

If you want to test this out, create a new node ID and run it for a few weeks to get some data and then furiously unplug the machine plug it back inā€¦ iterate until one or more sqlite3 DBs become corrupt and report back your results.

2 Likes

Looks like this is the problemā€¦

I mean, if the Linux filesystem drivers are that unreliable, why are companies running mission-critical production databases on Linux filesystems? Is this sensitivity to power cuts somehow unique to SQLite and not full-blown RDBMSes?

1 Like

Duckduckgo Search ā€œsqlite malformed databaseā€

This issue is not related to the Storj node softwareā€¦ and thus is very unlikely to be fixed by Storj developers, since it lies outside the scope of the software.

Perhaps a different database would be preferable. However, my understanding is that at least some of the data contained in the databases are being moved into the data directory structure. So, the choices seem to be:

  1. Develop a database agnostic go code base ourselves and issue a pull request on githubā€¦
  2. Deal with the current situation as best as we canā€¦ I personally have had no show stoppers with DB.
  3. Move the DBs off the data drive and see if that improves the situation.
1 Like

The db could be in a separate docker if the whole thing was managed by docker-compose.

This would make the use of a db like postgres quite easy.

1 Like

I believe the script should look like this:

  • stop storagenode
  • first step - run the PRAGMA integrity check;
  • if not OK or error than stop;
  • second step - run the VACUUM;
  • if not OK or error than stop;
  • third step - run the PRAGMA integrity check;
  • if not OK or error than stop;
  • start storagenode.
    This prevents vacuuming a malformated db.
    I appologise if I mistaken; Iā€™m not a programmer or SQL operator.