Database maintenance - Integrity check, Vacuum

snorkel · January 26, 2024, 6:53am

I want to make a guide for newbies like me in SQL, for database maintenance, and with your help, to fill the gaps. I reffer to Docker storagenode installation in Linux, and db operations on a Windows PC, but you can addapt to your enviroment. I will present the slowest manual method, just to make you understand the basics and the fact that is not rocket sience. I recommend to read and follow the official guide because is properly maintained, altough I can’t find an official guide for vacuum:
https://support.storj.io/hc/en-us/sections/360004515252-Databases-Issues

The databases you need to check and maintain from time to time are located in storage dir.
In my case, Synology DSM, the path is /volume1/Storj/storage. There are 14 db-es.
There are 2 operations you can do for maintenance: integrity check and vacuum. If the db has problems and needs repair or replacement, just follow the official guide.
The first operation checks for corrupted/malformated db, and the second one ‘defrags’ the db, aka compactes the records and recovers the empty space left behind by deleted records. This is a very non-tehnical explanation, but I believe you get the picture. After some time, records in db are written, increasing the db size on disk, and deleted, leaving empty records, but the db dosen’t shrink and dosen’t free up the space. This is when vacuum comes in and compactes the db, freeing up space. It dosen’t modify records and dosen’t do any destructive thing on db.
You can do these operations on the spot or after copy the db-es in another location, but only after you stop the node. You don’t need to remove the container, just stop the node. You will see that there are more files than just db-es, like wal files, that will disappear when you stop the node. I think they are index files or temp files, I’m not an expert, but they don’t need any maintenance, because they are created when node starts, and deleted when node stops. They are a good indicator that you stopped the node or not. If you see them there, it means you didn’t stop the node.
Assuming the container name is storagenode1:

docker stop -t 300 storagenode1

After you finished the operations, you move back the db-es with overwrite (if you copyed them), and restart the container:

docker restart -t 300 storagenode1

The db operations are done with sqlite3 - you need ver. v3.25.2 or newer.:
https://www.sqlite.org/download.html

I use a Windows 11 x64 PC to check my db-es so I download the Precompiled Binaries for Windows, sqlite-dll-win-x64 and sqlite-tools-win, and I unzip both archives in C:\sqlite.
Then stop the node and copy all 14 db-es to C:\Downloads. Assuming these 2 paths, below are the commands you can do.
Open Command Prompt - user mode.

cd c:\sqlite

You can start sqlite3 and read the help page and quit like this:

sqlite3

.help

.quit

You don’t need to start sqlite3 enviroment; you will do the operations from the command prompt.
Integrity check:

cd c:\sqlite
sqlite3 c:\downloads\bandwidth.db "PRAGMA integrity_check;"
sqlite3 c:\downloads\heldamount.db "PRAGMA integrity_check;"
sqlite3 c:\downloads\info.db "PRAGMA integrity_check;"
sqlite3 c:\downloads\notifications.db "PRAGMA integrity_check;"
sqlite3 c:\downloads\orders.db "PRAGMA integrity_check;"
sqlite3 c:\downloads\piece_expiration.db "PRAGMA integrity_check;"
sqlite3 c:\downloads\piece_spaced_used.db "PRAGMA integrity_check;"
sqlite3 c:\downloads\pieceinfo.db "PRAGMA integrity_check;"
sqlite3 c:\downloads\pricing.db "PRAGMA integrity_check;"
sqlite3 c:\downloads\reputation.db "PRAGMA integrity_check;"
sqlite3 c:\downloads\satellites.db "PRAGMA integrity_check;"
sqlite3 c:\downloads\secret.db "PRAGMA integrity_check;"
sqlite3 c:\downloads\storage_usage.db "PRAGMA integrity_check;"
sqlite3 c:\downloads\used_serial.db "PRAGMA integrity_check;"

Vacuum:

cd c:\sqlite
sqlite3 c:\downloads\bandwidth.db "VACUUM;"
sqlite3 c:\downloads\heldamount.db "VACUUM;"
sqlite3 c:\downloads\info.db "VACUUM;"
sqlite3 c:\downloads\notifications.db "VACUUM;"
sqlite3 c:\downloads\orders.db "VACUUM;"
sqlite3 c:\downloads\piece_expiration.db "VACUUM;"
sqlite3 c:\downloads\piece_spaced_used.db "VACUUM;"
sqlite3 c:\downloads\pieceinfo.db "VACUUM;"
sqlite3 c:\downloads\pricing.db "VACUUM;"
sqlite3 c:\downloads\reputation.db "VACUUM;"
sqlite3 c:\downloads\satellites.db "VACUUM;"
sqlite3 c:\downloads\secret.db "VACUUM;"
sqlite3 c:\downloads\storage_usage.db "VACUUM;"
sqlite3 c:\downloads\used_serial.db "VACUUM;"

For the integrity check, you will get an OK if the database is not currupt/malformated.
For vacuum, you won’t get any answer, you will just see the db get smaller in size.
I recovered 9% of space from a node with 600MiB db-es.
This is the most basic method and the slowest; you can/should use the offical scripts to automate the process.
For vaccum script, I found this for docker-compose:
https://forum.storj.io/t/vacuum-databases-in-a-ramdisk-to-reduce-downtime/7066
If anyone is willing to share one for non-Composer Docker installation, it whould be much appreciated.

In my case, for Windows machine, I believe this works in Power Shell, with sqlite3 binnaries, not full install:

cd C:\sqlite
Get-ChildItem C:\Downloads\*.db -File | %{$_.Name + " " + $(sqlite3.exe $_.FullName "PRAGMA integrity_check;")}
Get-ChildItem C:\Downloads\*.db -File | %{$_.Name + " " + $(sqlite3.exe $_.FullName "VACUUM;")}

Take note that the storagenode software always checks at start the integrity of databases and if it finds one with problems, it will stop and make a record in log about malformated db. So it is not necesary to do a integrity check if all looks good, the node works fine and the dashboard looks good, but is a good practice. The Vacuum makes the system more responsive and is a good practice to run it from time to time, maybe once a year at minimum. Both operations are not a must if all looks good and runs fine.

The databases are not critical for node operation; you can stop the node, delete all of them and start the node. You will loose the node’s history on your end, but dosen’t change your payout or satellite’s records.
To increase your system responsiveness and node performance, you can move the db-es to another drive, away from storagenode drive, like on a SSD, m.2 NVMe, flash drive, something with good wear resistance and high I/O; see this guide:
https://forum.storj.io/t/how-to-move-db-s-to-ssd-on-docker/7064

If you want to check info in db, like in piece_expiration.db, a good exemple is this:
https://forum.storj.io/t/quick-growth-of-piece-expiration-db/24965/34?u=snorkel

Alexey · January 26, 2024, 8:20am

Please note, you usually should not perform a vacuum, see

snorkel · January 26, 2024, 9:53pm

I read both main threads about vacuuming, and still didn’t got a final answear for this:

is it good or bad to vacuum storagenode databases?
is it needed to vacuum storagenode databases?
does it depends of the db-es placement on the node disk or on separate medium like a SSD?

From these discussions I only got that there are a bunch of assumptions for either point of view, without a physical proof. Even the devs answear is a question mark.

From my first expirience with vacuum and some reading and deduction, I can draw these conclutions for the moment:

vacuuming your db-es can recover ~10% of space;
for the current traffic, for nodes 3 years old or less, the gain is around 50-60MiB;
the vacuum is not a long term solution, because the activity of the node will create the same db fragmentation and unused space in a relatively short time;
for the space gain, that is so small, is not woth the trouble - you need to stop the node, than restart the node which will start a new filewalker, which will hammer your nodes for a new cycle, unnecessarly;
vacuuming the db-es and defragging them can impruve performance only if they are storred on HDD. On SSD is irrelevant.
the performance gains on HDD are still not clear and I’m not sure how much they last.

So, as a final conclusion, you can do vacuum or not, is up to you, is not mandatory, but only if you have the db-es on HDD you will see some impruvements maybe, at least for now.

Toyoo · January 26, 2024, 10:43pm

As a manual operation, it’s a small additional risk for usually questionable benefit. Proper automation reduces operator risk, but the benefits are miniscule.

Only in very specific circumstances, like after migrating from very slow storage that resulted in excessive growth of databases,

No. You can vacuum a database on any storage.

Your conclusions look correct to me.

Alexey · January 27, 2024, 4:28am

My opinion - it is useless, databases will grow again and you will end with the same state as before the vacuum.
However, if you have many issues like database is locked, it could help for a while (but will not fix it permanently).

snorkel · May 14, 2024, 8:41pm

If you get an error regarding command not found, you must put the path to sqlite3.exe file in the command, like this:

cd C:\sqlite

Get-ChildItem C:\Downloads\*.db -File | %{$_.Name + " " + $(C:\sqlite\sqlite3.exe $_.FullName "PRAGMA integrity_check;")}

Get-ChildItem C:\Downloads\*.db -File | %{$_.Name + " " + $(C:\sqlite\sqlite3.exe $_.FullName "VACUUM;")}

C:\sqlite\ is the installation folder for sqlite3.
C:\Downloads\ is the folder with the databases copied from the storagenode, after stopping the node.
I think the cd command is not necessary.
There are new db files since the first post.

Vacuum results… see the size shrink:
-before vacuum:

-after vacuum: