I have a bunch of nodes and tested runtime of vacuuming the databases in-place against copying them to a ramdisk, vacuuming them there, and moving them back to the disk. The results are quite impressive.
The orders database takes the overwhelming majority of the time so I’m including its size before and after. Both nodes run on the same host (two Docker containers). They run on different HDDs; both are exactly the same model.
Node 1: 300MB orders database. Vacuuming all databases in-place took 129 seconds. After, the orders database was 157MB.
Node 2: 380MB orders database. Moving all database to a ramdisk, vacuuming them, and moving them back took 9 seconds. After, the orders database was 95MB.
That’s a whopping 93% reduction in the downtime required to vacuum databases. I would guess based on this that the majority of the in-place vacuum is spent waiting for the HDD to seek back and forth between the old and new databases, and possibly performing out-of-order reads on the old database. Even with the overhead of moving the databases to and from a ramdisk, the ramdisk approach is substantially faster most likely because each file is read or written once and in-order.
In-place vacuum script:
docker-compose stop -t 300 && \ for i in data/storage/*.db; do echo "$i" sqlite3 "$i" 'VACUUM;' done && \ docker-compose up -d
Ramdisk vacuum script (assumes that
/dev/shm is a tmpfs, which is the default on many Linux distributions):
docker-compose stop -t 300 && \ mkdir /dev/shm/storj && \ cp -av data/storage/*.db /dev/shm/storj/ && \ for i in /dev/shm/storj/*.db; do echo "$i" sqlite3 "$i" 'VACUUM;' done && \ mv -v /dev/shm/storj/*.db data/storage/ && \ docker-compose up -d