"Action": "GET_AUDIT", "error": "usedserialsdb error: database is locked",

Pac · May 22, 2020, 8:54am

Thx for the convenient script @anon27637763

Really feels like such procedures should be part of a regular housekeeping routine nodes should do on their own.

One of my nodes returns an error while running one of the commands apparently:

ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
wrong # of entries in index pk_used_serial_

Is this concerning? Should/Cant it be fixed?

Alexey · May 22, 2020, 7:37pm

Yes. Your database looks like corrupted. Please, fix it:

And I hope that you performed the vacuuming on the stopped node.

Pac · May 23, 2020, 9:20am

Thx @Alexey, very helpful as always

I did stop the node before running @anon27637763’s script.

The corrupted database file was used_serial.db.
Restoring it (command sqlite3 used_serial.db ".read dump_all_notrans.db") took ages.
I initially had started the process on the HDD, but because it was so slow I switched to the SD card which did go faster, but it still took roughly 1 hour to recreate this 25MB file!
That’s frightening. Hope it did not kill my SD card…

I’m not sure all of that is normal, but the node is back online and for the moment, it seems like it’s working…

I still think such attempts at fixing a database should be handled by the node itself because apparently that happens regularly with sqlite databases. It could check all that after a version update for instance. Or on a regular basis.
In my case, I had a power outage a few days ago, could it be the root cause of this malformed database? If so, surely it will happen time to time to many SNOs in the future.
Just my 2 cts

Alexey · May 23, 2020, 12:44pm

You can create an idea there: Storage Node feature requests - voting - Storj Community Forum (official)

Most likely yes.

Pac · May 25, 2020, 9:10am

Alright why not.

Here it is: Nodes autorepair

cdhowie · May 25, 2020, 12:19pm

No. No, absolutely not – unless the storagenode software is doing something incredibly unsafe with the database, such as using it without a journal.

Both rollback and WAL journal modes should 100% protect the database from becoming malformed even in the event of a sudden power cut.

@Alexey, if you believe that this was caused by a power outage, then you should report a critical bug to the developers (either storagenode or SQLite). This should not happen.

anon27637763 · May 25, 2020, 1:40pm

SQLite has a statement in their documentation on such errors…

Pretty much the statement is:

SQLite does its best, but sometimes filesystems mess up database files when the power is shutdown abruptly.

cdhowie · May 25, 2020, 7:14pm

I can find no such quote in their documentation. The closest thing hints that drives that do not honor a sync request may lose/corrupt data because they report success before the data is actually persisted (which is a storage-level bug/misfeature – and the filesystem driver or anything else that wants to ensure writes actually happen would suffer the same problem). So “100%” in my post may be a bit hyperbolic… but these issues seem to crop up an awful lot. More than I’ve ever seen before with SQLite, which makes me believe there is a software bug somewhere.

If a filesystem causes such corruption then it would be a filesystem-level bug.

anon27637763 · May 25, 2020, 7:28pm

SQLite: How to Corrupt a Database

6.3. Filesystem Corruption

Since SQLite databases are ordinary disk files, any malfunction in the filesystem can corrupt the database. Filesystems in modern operating systems are very reliable, but errors do still occur. For example, on 2013-10-01 the SQLite database that holds the Wiki for Tcl/Tk went corrupt a few days after the host computer was moved to a dodgy build of the (linux) kernel that had issues in the filesystem layer. In that event, the filesystem eventually became so badly corrupted that the machine was unusable, but the earliest symptom of trouble was the corrupted SQLite database.

If a machine loses power, the filesystem can become broken… or a number of filesystem issues may cause the open databases to have errors.

But read the whole page…

If you want to test this out, create a new node ID and run it for a few weeks to get some data and then furiously unplug the machine plug it back in… iterate until one or more sqlite3 DBs become corrupt and report back your results.

cdhowie · May 26, 2020, 12:09am

Looks like this is the problem…

I mean, if the Linux filesystem drivers are that unreliable, why are companies running mission-critical production databases on Linux filesystems? Is this sensitivity to power cuts somehow unique to SQLite and not full-blown RDBMSes?

anon27637763 · May 26, 2020, 3:02am

Duckduckgo Search “sqlite malformed database”

This issue is not related to the Storj node software… and thus is very unlikely to be fixed by Storj developers, since it lies outside the scope of the software.

Perhaps a different database would be preferable. However, my understanding is that at least some of the data contained in the databases are being moved into the data directory structure. So, the choices seem to be:

Develop a database agnostic go code base ourselves and issue a pull request on github…
Deal with the current situation as best as we can… I personally have had no show stoppers with DB.
Move the DBs off the data drive and see if that improves the situation.

Pac · May 28, 2020, 9:44pm

The db could be in a separate docker if the whole thing was managed by docker-compose.

This would make the use of a db like postgres quite easy.

snorkel · January 24, 2024, 8:52pm

I believe the script should look like this:

stop storagenode
first step - run the PRAGMA integrity check;
if not OK or error than stop;
second step - run the VACUUM;
if not OK or error than stop;
third step - run the PRAGMA integrity check;
if not OK or error than stop;
start storagenode.
This prevents vacuuming a malformated db.
I appologise if I mistaken; I’m not a programmer or SQL operator.