Vacuum databases in a ramdisk to reduce downtime

cdhowie · May 22, 2020, 8:11pm

I have a bunch of nodes and tested runtime of vacuuming the databases in-place against copying them to a ramdisk, vacuuming them there, and moving them back to the disk. The results are quite impressive.

The orders database takes the overwhelming majority of the time so I’m including its size before and after. Both nodes run on the same host (two Docker containers). They run on different HDDs; both are exactly the same model.

Node 1: 300MB orders database. Vacuuming all databases in-place took 129 seconds. After, the orders database was 157MB.

Node 2: 380MB orders database. Moving all database to a ramdisk, vacuuming them, and moving them back took 9 seconds. After, the orders database was 95MB.

That’s a whopping 93% reduction in the downtime required to vacuum databases. I would guess based on this that the majority of the in-place vacuum is spent waiting for the HDD to seek back and forth between the old and new databases, and possibly performing out-of-order reads on the old database. Even with the overhead of moving the databases to and from a ramdisk, the ramdisk approach is substantially faster most likely because each file is read or written once and in-order.

In-place vacuum script:

docker-compose stop -t 300 && \
for i in data/storage/*.db; do
  echo "$i"
  sqlite3 "$i" 'VACUUM;'
done && \
docker-compose up -d

Ramdisk vacuum script (assumes that /dev/shm is a tmpfs, which is the default on many Linux distributions):

docker-compose stop -t 300 && \
mkdir /dev/shm/storj && \
cp -av data/storage/*.db* /dev/shm/storj/ && \
for i in /dev/shm/storj/*.db; do
  echo "Vacuum: $i"
  sqlite3 "$i" 'VACUUM;'
done && \
mv -v /dev/shm/storj data/storage/database-temp && \
sync -f data/storage/database-temp && \
rm -fv data/storage/*.db* && \
mv -v data/storage/database-temp/* data/storage/ && \
rmdir data/storage/database-temp && \
docker-compose up -d

direktorn · May 22, 2020, 8:24pm

What would be the real impact on a node? You also assume your HDs are a benchmark for all SNOs. You also assume that VACUUM 300MB to 157 would have a huge real-time difference on your node. Where is that proof?

cdhowie · May 22, 2020, 8:28pm

What is the purpose behind such a combative reply when all I did is share information about what happened on my own nodes? Please back off a bit.

Less downtime… means more uptime. Which is good for obvious reasons. You can’t earn egress traffic or acquire new data while the node is offline.

I never said that anywhere and I did not intend to imply that. I’m showing stats from what happened on my own hardware with my specific databases. Other SNOs can compare performance of the approaches on their own nodes if they would like. In fact I would love to see other SNOs test and see which is faster and report their metrics.

My expectation is that SNOs storing their databases on HDDs will see similar results but I have no evidence that is the case, and – again – I am not claiming that to be true. Though it would not surprise me if it were true.

I don’t expect SSDs would necessarily see much improvement. Some, perhaps, but not 90+%.

I don’t know what this sentence means. If you are asking whether vacuuming is even necessary, it can reduce fragmentation of the databases and thereby improve access times. This has been documented in other forum posts when slow write performance due to fragmentation caused excessive lock times.

The reduction in the size of the database is not the primary benefit, but it does illustrate the amount of garbage-collection that took place. This is necessary to compare the duration of each approach as a wildly different database size could also contribute to the difference in vacuum duration.

For example, vacuuming the second database probably required less I/O since the resulting database size was smaller, indicating less data had to be read/written. However, that should not account for a 93% reduction in duration… maybe 30% based on the relative sizes.

direktorn · May 22, 2020, 9:05pm

Are you saying I don’t have the right to have an opinion? Perfect

Mark · May 22, 2020, 9:07pm

Thanks. I Just ran a modified version of your in place script. It finished pretty fast but my DBs are not very old and I had defraged them recently also. I had planned to eventually vacuum them one at a time when I got bored enough but this script made it easy to do them all at once.

cdhowie · May 22, 2020, 9:16pm

Not at all, but creating a strawman (“You also assume your HDs are a benchmark for all SNOs”) and then attacking that is rude and fallacious.

I never set out to prove anything, only provide information about what happened on my systems. Making demands for proof is silly in this context.

Try it yourself and see what happens! I’m interested to see numbers on the difference between the two approaches from other SNOs, regardless of which approach turns out to be generally faster. Include whether it was an HDD or SSD.

Alexey · May 23, 2020, 2:13pm

5 posts were split to a new topic: Is the vacuum necessary?

Alexey · May 23, 2020, 4:01pm

4 posts were merged into an existing topic: Is the vacuum necessary?

direktorn · May 23, 2020, 9:20pm

Alexy decided to split some comments, making it even a more mess.

anon27637763 · May 23, 2020, 9:26pm

Spaghetti forum-ing…

Organization of threads is a tough job. Frankly, I’m impressed with the management job here.

cdhowie · May 23, 2020, 10:41pm

The split comments were questioning whether vacuuming is even necessary, while this topic is about vacuuming the databases in a ramdisk to make the vacuum process faster. They didn’t belong in this thread.

joesmoe · May 24, 2020, 8:47am

Someone told me that vacuum is built into the latest version so its no longer manually necessary - is this accurate information?

Vadim · June 19, 2020, 8:00am

For Vacum DB on windows GUI I recomend SQL Browser it can vacum, db with normal human interface, no scripts, also can make integrity checks.

cdhowie · June 23, 2020, 5:35pm

I’m not sure why I can’t edit the original post anymore (ping @Alexey) but the ramdisk-vacuum script is dangerous and needs to be fixed. If stopping the storagenode container times out, one or more WALs may still exist. This is not currently accounted for in the script and this causes two very bad things to happen:

Anything in the WAL that could be replayed is lost during the vacuum.
When storagenode starts back up after the databases are moved back, sqlite tries to replay the old WAL against the newly-vacuumed database and this corrupts the database.

This happened to me and corrupted my bandwidth DB. I was able to fix it with .dump and manually cleaning up corruption in the generated script, then rebuilding the database from that script.

The vacuum script is fixed by:

Copying any WALs to the ramdisk (replace *.db with *.db*).
Removing all database files prior to moving the vacuumed databases back because the WAL will have been deleted by the vacuum process (add rm *.db*).
To make (2) safe in the case of an unexpected shutdown, we instead move the databases to a subdirectory of the storagenode data directory; if a poweroff happens, then the database files are guaranteed to be there (but may need to be manually moved back).

New script:

docker-compose stop -t 300 && \
mkdir /dev/shm/storj && \
cp -av data/storage/*.db* /dev/shm/storj/ && \
for i in /dev/shm/storj/*.db; do
  echo "Vacuum: $i"
  sqlite3 "$i" 'VACUUM;'
done && \
mv -v /dev/shm/storj data/storage/database-temp && \
sync -f data/storage/database-temp && \
rm -fv data/storage/*.db* && \
mv -v data/storage/database-temp/* data/storage/ && \
rmdir data/storage/database-temp && \
docker-compose up -d

SGC · June 23, 2020, 6:12pm

version 1.0 of anything almost always failed in some critical aspect over time… i also never buy 1st gen tech for the exact same reason

Alexey · June 23, 2020, 7:58pm

Made it a wiki, now you can edit it again

shoofar · September 13, 2020, 8:27pm

@cdhowie
Hi, just made my first attempt to vacuum one of the database the biggest one - orders.db
It reduced the size from 500MB to 24KB … are you sure that this process is safe?
I am not copying it back (still running the old “before vacuum” version) as it seems too good to be true for size reduction.

anon27637763 · September 13, 2020, 9:24pm

Yes. It’s safe.

Vacuuming a DB consists of pruning prior row operations, such as deletes and updates. None of the current data in the database tables are affected.

Storj nodes will generally see a very large reduction in database size due to the large number of prior row operations that are stored in the database.

joesmoe · September 14, 2020, 10:37am

Why isn’t this automated?

Vadim · September 14, 2020, 10:47am

You neet turn off node to make it. And shold be made under control of operater.