Database is locked. What is the reason? What is the possible solution?

Here is good exampe how HDD fragmentation affecting databases:

And of course, it significantly affects storage for pieces (a lot of IO operations)

I think fragmentation on HDD is the root cause of database locks and performance degradation.

1 Like

If that is the case then that explains why vacuuming the database fixes the problem: SQLite vacuum works by reading in the entire database while simultaneously writing out a new version of the database with the same effective contents into a new file, then moving the new file over the old file when that process is done. Releasing deleted records and clustering of the data is the primary function of vacuum, but filesystem-level defragmentation also happens as a byproduct of how vacuuming is performed.

I’d be particularly interested to see filefrag output on the database files before and after vacuuming.

2 Likes

Since vacuuming will write a new file fragmentation will be less I assume, unless your disk is nearly full.
I defragment all databases daily. With BTRFS it can be done even if the files are in use. I haven’t vacuumed the databases yet though.

I wrote about vacuum a few post ago :slight_smile:

Please give me some time to collect statistics, and I will write another post about “how the weather on SSD with databases” :slight_smile:

I solved locked/malformed sqlite databases dumping each one and recreating them back. Easy, fast (and safe if you know what you are doing) and I guess it reduces fragmentation.

$ rsync -ah --progress orders.db orders.db.test
sending incremental file list
orders.db
571.36M 100% 32.52MB/s 0:00:16 (xfr#1, to-chk=0/1)

far from perfect, but seems to do just fine, maybe it’s because my orders.db is only a couple of months old.

turned out the low speed transfer max was due to other factors rather than fragmentation.

This is the same as vacuum command :slight_smile:

1 Like

loved increased cache_size with old >4gb info.db
btw vacuum never fixed malformed databases :rofl:

Of course, but you talked about defragmentation

I migrated my databases to the SSD and collected statistics. As I wrote before, we have a huge fragmentation of the databases, but after migration, to the SSD this problem is not actual.

Let’s look into statistics of SSD with databases:


As you can see, we have spikes every hour (sent orders to the satellite)

Let’s look more detailed to one of these spikes


Here we can see a lot of IO operations! (1600 for reads + 1400 for writes) but these requests are very small because bandwidth is low (5.9MiB/s for reading and 7.8MiB/s for write). This spike takes about 5 sec on SSD.

Let’s also look into other interesting parameters of these spikes:

Here we can see that one operation takes about 2ms.

Here we can see the average bandwidth of operation

So, let’s summarize and make a conclusion.

What I expected before starting the migration to the SSD:

  1. Solve issue with “locks” - yes, it solved. As we can see every hour we have a lot of small IO operations that will definitely can’t be handle by regular HDD, but SSD can handle it easily. Also more orders (traffic) = more small IO operations.
  2. No database performance degradation during a time (one month with active ingress is enough to heavy performance impact) - yes, it also solved. SSD no need move heads and file system fragmentation is not affecting database performance. Before migration, we saw a huge database fragmentation on the HDD that extremely impact performance with a lot of small IO opearion.
  3. We offload main storage for pieces and improve IOPS - yes, it definitely improved. Because we move out fragmented databases than needing a lot of small IO operations our main storage for pieces now has a big offload of unnecessary head movements and we speed up of delivery pieces is improved.

Conclusion: Move the database to SSD it an awesome feature that we must have especially if you advanced storage node operator and have big storage node. Because a lot of small IO operations can easily kill the performance of every single HDD or any RAID with HDDs and produce a lot of problems like database locks because HDD is will be so busy to handle these operations.

11 Likes

That’s very impressive and detailed study. Good job!

3 Likes

but wouldn’t copying the database remove the fragmentation and for my SSD with my current zfs my old ssd will only do 4k IOPS with my current Q1T1 bottleneck because of misconfiguration of the LUN’s

But i’m not seeing any of that, i got the hourly spike, but it doesn’t created any sort of backlog, or such on any of my drives which at the moment is down at 8ms peak.

ofc i don’t really get fragmented files because i turned it of by using sync=always in zfs, on another not i haven’t really seen any of those database is locked messages for a while, but that may be because i finally got around to not rebooting my system again and again…

also am i missing something, but it doesn’t really seem like you state what hardware you are testing on, not that it really matters but i did wonder a bit if my 2xraidz1 of 4 and 5 drives was partly to blame about being able to keep up with the read write iops for the db committing or verifying everything to disk or whatever it’s doing every hour…

i might try to move the db eventually, just to see if my numbers change, but currently i don’t really see a need… ofc node size might play a very large role in db fragmentation and load on the system.
i would assume if one doubles the capacity of a node, the db sizes and thus load on working with them would also double, even tho the load of the node data ingress / egress would be pretty much the same.

So, what’s the accepted way of doing it with docker? As I understand the paths in config.yaml are relative, so I probably need to mount the SSD under /storj/node/ssd and specify “ssd” in the config file?
I am not super familiar with docker.

I see, I think I should prepare short instruction how we can move DB’s with docker.
(I on docker too)
Please give me some time.

2 Likes

6 posts were split to a new topic: How to move DB’s to SSD on Docker

I think this should be a new topic so this can be categorized as SNO->FAQ

1 Like

Yes, I agree, but I think this instruction for advanced SNO only, a newbie can kill their nodes…
Let’s ask @Alexey to review it and make a decision for the move or not.

4 Likes

This is the part I’m not sure about. If it doesn’t it could be online but fail to respond to audits. It could get into trouble quickly. It would be nice if it just crashed and would only be offline. Especially if you’re monitoring that with uptime robot.

1 Like

Right. Crashing would be a best-case scenario. I use Zabbix to monitor all of my nodes so that would get caught quickly when it stops responding to API requests.

loss of the databases aren’t a trivial matter and can hurt your storagenode maybe even get it DQ in worst case scenarios, so i would stick to using a proper ssd.

remember the databases controls a lot of stuff locally so them getting corrupt or lost can most likely lead to some pretty messy shituations… even if they can be restored, one has to catch the error before it goes to far for either the data or the satellites to accept.

tho it might be an option to introduce an automatic database restoration, if the system detects its corrupted, and then simply put it back at its original point, or it could have the database in multiple locations and update it less frequent in one big sequential write…