Database is locked. What is the reason? What is the possible solution?

Odmin · May 21, 2020, 7:56am

Here is good exampe how HDD fragmentation affecting databases:

And of course, it significantly affects storage for pieces (a lot of IO operations)

I think fragmentation on HDD is the root cause of database locks and performance degradation.

cdhowie · May 21, 2020, 9:23am

If that is the case then that explains why vacuuming the database fixes the problem: SQLite vacuum works by reading in the entire database while simultaneously writing out a new version of the database with the same effective contents into a new file, then moving the new file over the old file when that process is done. Releasing deleted records and clustering of the data is the primary function of vacuum, but filesystem-level defragmentation also happens as a byproduct of how vacuuming is performed.

I’d be particularly interested to see filefrag output on the database files before and after vacuuming.

donald.m.motsinger · May 21, 2020, 9:40am

Since vacuuming will write a new file fragmentation will be less I assume, unless your disk is nearly full.
I defragment all databases daily. With BTRFS it can be done even if the files are in use. I haven’t vacuumed the databases yet though.

Odmin · May 21, 2020, 11:20am

I wrote about vacuum a few post ago

Please give me some time to collect statistics, and I will write another post about “how the weather on SSD with databases”

JoshGarza · May 21, 2020, 2:37pm

I solved locked/malformed sqlite databases dumping each one and recreating them back. Easy, fast (and safe if you know what you are doing) and I guess it reduces fragmentation.

SGC · May 21, 2020, 2:52pm

$ rsync -ah --progress orders.db orders.db.test
sending incremental file list
orders.db
571.36M 100% 32.52MB/s 0:00:16 (xfr#1, to-chk=0/1)

far from perfect, but seems to do just fine, maybe it’s because my orders.db is only a couple of months old.

turned out the low speed transfer max was due to other factors rather than fragmentation.

Alexey · May 21, 2020, 8:15pm

This is the same as vacuum command

JoshGarza · May 21, 2020, 8:32pm

loved increased cache_size with old >4gb info.db
btw vacuum never fixed malformed databases

Alexey · May 21, 2020, 9:00pm

Of course, but you talked about defragmentation

Odmin · May 22, 2020, 7:27am

I migrated my databases to the SSD and collected statistics. As I wrote before, we have a huge fragmentation of the databases, but after migration, to the SSD this problem is not actual.

Let’s look into statistics of SSD with databases:

As you can see, we have spikes every hour (sent orders to the satellite)

Let’s look more detailed to one of these spikes

Here we can see a lot of IO operations! (1600 for reads + 1400 for writes) but these requests are very small because bandwidth is low (5.9MiB/s for reading and 7.8MiB/s for write). This spike takes about 5 sec on SSD.

Let’s also look into other interesting parameters of these spikes:

Here we can see that one operation takes about 2ms.

Here we can see the average bandwidth of operation

So, let’s summarize and make a conclusion.

What I expected before starting the migration to the SSD:

Odmin:

I researched this problem a few months ago, I got this result: During the working of storage node we have a lot of writes to the database, this writes increase fragmentation of database files (extremely increase) and we will get performance degradation (a lot of IOPS for simply query). Calling vacuum is not a silver bullet but it really helps because during this operation we rewrite the whole file sequentially (like defragmentation on disk), the same result we will have when just move a database to /tmp and back to the original place (rewrite the whole file sequentially).
So, root cause is - file fragmentation on HDD (not a database structure fragmentation).
I very happy that we have a possibility to move databases on this release to the SSD, for me it best solution for keep database always “defragmented” (we just will not care about defragmentation) I will put it to my mirrored system SSD and will monitor how it working. I expect a lot of benefits at the same time:

Solve issue with “locks”

No database performance degradation during a time (one month with active ingress is enough to heavy performance impact)

We offload main storage for pieces and improve IOPS

Solve issue with “locks” - yes, it solved. As we can see every hour we have a lot of small IO operations that will definitely can’t be handle by regular HDD, but SSD can handle it easily. Also more orders (traffic) = more small IO operations.
No database performance degradation during a time (one month with active ingress is enough to heavy performance impact) - yes, it also solved. SSD no need move heads and file system fragmentation is not affecting database performance. Before migration, we saw a huge database fragmentation on the HDD that extremely impact performance with a lot of small IO opearion.
We offload main storage for pieces and improve IOPS - yes, it definitely improved. Because we move out fragmented databases than needing a lot of small IO operations our main storage for pieces now has a big offload of unnecessary head movements and we speed up of delivery pieces is improved.

Conclusion: Move the database to SSD it an awesome feature that we must have especially if you advanced storage node operator and have big storage node. Because a lot of small IO operations can easily kill the performance of every single HDD or any RAID with HDDs and produce a lot of problems like database locks because HDD is will be so busy to handle these operations.

nerdatwork · May 22, 2020, 7:32am

That’s very impressive and detailed study. Good job!

SGC · May 22, 2020, 8:37am

but wouldn’t copying the database remove the fragmentation and for my SSD with my current zfs my old ssd will only do 4k IOPS with my current Q1T1 bottleneck because of misconfiguration of the LUN’s

But i’m not seeing any of that, i got the hourly spike, but it doesn’t created any sort of backlog, or such on any of my drives which at the moment is down at 8ms peak.

ofc i don’t really get fragmented files because i turned it of by using sync=always in zfs, on another not i haven’t really seen any of those database is locked messages for a while, but that may be because i finally got around to not rebooting my system again and again…

also am i missing something, but it doesn’t really seem like you state what hardware you are testing on, not that it really matters but i did wonder a bit if my 2xraidz1 of 4 and 5 drives was partly to blame about being able to keep up with the read write iops for the db committing or verifying everything to disk or whatever it’s doing every hour…

i might try to move the db eventually, just to see if my numbers change, but currently i don’t really see a need… ofc node size might play a very large role in db fragmentation and load on the system.
i would assume if one doubles the capacity of a node, the db sizes and thus load on working with them would also double, even tho the load of the node data ingress / egress would be pretty much the same.

Pentium100 · May 22, 2020, 10:22am

So, what’s the accepted way of doing it with docker? As I understand the paths in config.yaml are relative, so I probably need to mount the SSD under /storj/node/ssd and specify “ssd” in the config file?
I am not super familiar with docker.

Odmin · May 22, 2020, 10:26am

I see, I think I should prepare short instruction how we can move DB’s with docker.
(I on docker too)
Please give me some time.

Alexey · May 22, 2020, 7:02pm

6 posts were split to a new topic: How to move DB’s to SSD on Docker

nerdatwork · May 22, 2020, 1:36pm

I think this should be a new topic so this can be categorized as SNO->FAQ

Odmin · May 22, 2020, 1:52pm

Yes, I agree, but I think this instruction for advanced SNO only, a newbie can kill their nodes…
Let’s ask @Alexey to review it and make a decision for the move or not.

BrightSilence · May 22, 2020, 6:01pm

This is the part I’m not sure about. If it doesn’t it could be online but fail to respond to audits. It could get into trouble quickly. It would be nice if it just crashed and would only be offline. Especially if you’re monitoring that with uptime robot.

cdhowie · May 22, 2020, 6:02pm

Right. Crashing would be a best-case scenario. I use Zabbix to monitor all of my nodes so that would get caught quickly when it stops responding to API requests.

SGC · May 22, 2020, 6:03pm

loss of the databases aren’t a trivial matter and can hurt your storagenode maybe even get it DQ in worst case scenarios, so i would stick to using a proper ssd.

remember the databases controls a lot of stuff locally so them getting corrupt or lost can most likely lead to some pretty messy shituations… even if they can be restored, one has to catch the error before it goes to far for either the data or the satellites to accept.

tho it might be an option to introduce an automatic database restoration, if the system detects its corrupted, and then simply put it back at its original point, or it could have the database in multiple locations and update it less frequent in one big sequential write…