Vacuum databases in a ramdisk to reduce downtime

I have a bunch of nodes and tested runtime of vacuuming the databases in-place against copying them to a ramdisk, vacuuming them there, and moving them back to the disk. The results are quite impressive.

The orders database takes the overwhelming majority of the time so I’m including its size before and after. Both nodes run on the same host (two Docker containers). They run on different HDDs; both are exactly the same model.

Node 1: 300MB orders database. Vacuuming all databases in-place took 129 seconds. After, the orders database was 157MB.

Node 2: 380MB orders database. Moving all database to a ramdisk, vacuuming them, and moving them back took 9 seconds. After, the orders database was 95MB.

That’s a whopping 93% reduction in the downtime required to vacuum databases. I would guess based on this that the majority of the in-place vacuum is spent waiting for the HDD to seek back and forth between the old and new databases, and possibly performing out-of-order reads on the old database. Even with the overhead of moving the databases to and from a ramdisk, the ramdisk approach is substantially faster most likely because each file is read or written once and in-order.

In-place vacuum script:

docker-compose stop -t 300 && \
for i in data/storage/*.db; do
  echo "$i"
  sqlite3 "$i" 'VACUUM;'
done && \
docker-compose up -d

Ramdisk vacuum script (assumes that /dev/shm is a tmpfs, which is the default on many Linux distributions):

docker-compose stop -t 300 && \
mkdir /dev/shm/storj && \
cp -av data/storage/*.db /dev/shm/storj/ && \
for i in /dev/shm/storj/*.db; do
  echo "$i"
  sqlite3 "$i" 'VACUUM;'
done && \
mv -v /dev/shm/storj/*.db data/storage/ && \
docker-compose up -d
6 Likes

What would be the real impact on a node? You also assume your HDs are a benchmark for all SNOs. You also assume that VACUUM 300MB to 157 would have a huge real-time difference on your node. Where is that proof?

1 Like

What is the purpose behind such a combative reply when all I did is share information about what happened on my own nodes? Please back off a bit.

Less downtime… means more uptime. Which is good for obvious reasons. You can’t earn egress traffic or acquire new data while the node is offline.

I never said that anywhere and I did not intend to imply that. I’m showing stats from what happened on my own hardware with my specific databases. Other SNOs can compare performance of the approaches on their own nodes if they would like. In fact I would love to see other SNOs test and see which is faster and report their metrics.

My expectation is that SNOs storing their databases on HDDs will see similar results but I have no evidence that is the case, and – again – I am not claiming that to be true. Though it would not surprise me if it were true.

I don’t expect SSDs would necessarily see much improvement. Some, perhaps, but not 90+%.

I don’t know what this sentence means. If you are asking whether vacuuming is even necessary, it can reduce fragmentation of the databases and thereby improve access times. This has been documented in other forum posts when slow write performance due to fragmentation caused excessive lock times.

The reduction in the size of the database is not the primary benefit, but it does illustrate the amount of garbage-collection that took place. This is necessary to compare the duration of each approach as a wildly different database size could also contribute to the difference in vacuum duration.

For example, vacuuming the second database probably required less I/O since the resulting database size was smaller, indicating less data had to be read/written. However, that should not account for a 93% reduction in duration… maybe 30% based on the relative sizes.

2 Likes

Are you saying I don’t have the right to have an opinion? Perfect

Thanks. I Just ran a modified version of your in place script. It finished pretty fast but my DBs are not very old and I had defraged them recently also. I had planned to eventually vacuum them one at a time when I got bored enough but this script made it easy to do them all at once.

1 Like

Not at all, but creating a strawman (“You also assume your HDs are a benchmark for all SNOs”) and then attacking that is rude and fallacious.

I never set out to prove anything, only provide information about what happened on my systems. Making demands for proof is silly in this context.

Try it yourself and see what happens! I’m interested to see numbers on the difference between the two approaches from other SNOs, regardless of which approach turns out to be generally faster. Include whether it was an HDD or SSD.

6 Likes

5 posts were split to a new topic: Is the vacuum necessary?

4 posts were merged into an existing topic: Is the vacuum necessary?

Alexy decided to split some comments, making it even a more mess.

Spaghetti forum-ing…

Organization of threads is a tough job. Frankly, I’m impressed with the management job here.

5 Likes

The split comments were questioning whether vacuuming is even necessary, while this topic is about vacuuming the databases in a ramdisk to make the vacuum process faster. They didn’t belong in this thread.

Someone told me that vacuum is built into the latest version so its no longer manually necessary - is this accurate information?