How to move DB’s to SSD on Docker

How to move DB’s to SSD on Docker

Before you beginning, please make sure that your SSD has good endurance (MLC is preferred), I personally recommend using SSD mirror.

  1. look into the official documentation and make sure that you are using –mount type=bind parameter in your docker run string
  2. Prepare a folder with mounted SSD outside of <storage-dir> from the official documentation. (it your folder with pieces)
  3. Add a new mont string to your docker run string:

Now we have:

docker run -d --restart unless-stopped --stop-timeout 300
-p 28967:28967
-p 127.0.0.1:14002:14002
-e WALLET=“0xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”
-e EMAIL="user@example.com"
-e ADDRESS=“domain.ddns.net:28967
-e STORAGE=“2TB”
–mount type=bind,source="",destination=/app/identity
–mount type=bind,source="",destination=/app/config
–name storagenode storjlabs/storagenode:beta

should be:

docker run -d --restart unless-stopped --stop-timeout 300 \
    -p 28967:28967 \
    -p 127.0.0.1:14002:14002 \
    -e WALLET="0xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" \
    -e EMAIL="user@example.com" \
    -e ADDRESS="domain.ddns.net:28967" \
    -e STORAGE="2TB" \
    --mount type=bind,source="<identity-dir>",destination=/app/identity \
    --mount type=bind,source="<storage-dir>",destination=/app/config \
    --mount type=bind,source="<database-dir>",destination=/app/dbs \
    --name storagenode storjlabs/storagenode:beta
  1. Add/change a new parameter to your config.yaml
# directory to store databases. if empty, uses data path
# storage2.database-dir: ""
storage2.database-dir: "dbs"
  1. Stop and remove your storagenode container
    docker stop storagenode -t 300 && docker rm storagenode

  2. Copy all databases from “storage-dir\storage” to the new location “database-dir”. Do not move it! (if something goes wrong we just started with the database on old location instead of storagenode will recreate it)

  3. Start your new docker run .... string

  4. Make sure that on database-dir you see files with .db-shm and .db-wal like on the screenshot

Summary

image

  1. If you can see files .db-shm and .db-wal on the new location “database-dir”, now you can delete database files from the old location “storage-dir\storage”.
7 Likes

I’ve been pondering making an instruction for the same reason. I kind of feel if you don’t know how to figure this one out by yourself, you probably shouldn’t so it. Additionally, while it helps with performance, it definitely adds an additional point of failure to your node. Because now HDD failure is not the only thing that can take your node down. The SSD failing can as well.

So I think if we are going to make an instruction available as a separate post, we should include these warnings and caveats. And probably dissuade people from doing this unless they know it’s absolutely necessary for their setup. I mean, I know what needs to be done and have done it for 2 nodes now. But… I was a little in a hurry on the second one and forgot to copy the db’s. Caught is quickly enough and merged them together again (don’t ask how… it’s a lot of manual work, just avoid this). It’s too easy to mess up and not easy to fix problems afterwards. The only reason I’m doing this is because I’m using devices that almost certainly won’t be able to keep up otherwise after they are vetted and getting the full load. Additionally, I use RAID on the db disk, so there is protection against drive failure to mitigate the risk of relying on more than 1 disk for 1 node.

2 Likes

Made the change myself yesterday and it was pretty easy to figure out after a small amount of trial and error (forgot a space in the config) but it does require some critical thinking and a base understanding beyond just copying commands. I think for a setting like this does need to have some sort of technical hurdle to implement due to the risks. Put it on my 3x replicated CEPH SSD pool so hopefully that affords enough protection in the event of taking a hardware hit.

As an update, whereas before I would see locking multiple times an hour, after moving it 24 hours ago no database locking errors, so I would say that is a success. Seeing some pretty heavy bursts of IO periodically on the SSD volume so that definitely explains why it was locked, the HDD pool was having a bit of a tough time with latency requirements for responding fast enough for that.

1 Like

Are all of the databases actually critical for storagenode operation? I thought many of them were just for the dashboard.

If none of them are required (e.g. if you stop the node, delete them, then start the node and everything works fine even if some orders are lost) then it would be a point of failure with regards to uptime, but not durability. That is, if the SSD dies then the node gets I/O errors on the databases, but stopping the node and pointing the DB storage at an empty, working storage location would fix the problem.

If some of the databases are actually required for operation, I wonder if it could be possible in the future to move only the ephemeral/not-strictly-needed databases to different storage. Those databases could even be stored on a ramdisk.

From what I remember about @littleskunk’s post, your node can survive losing all dbs as long as you have all the data.

Nice. So then what I said is true – it’s a point of failure for uptime (the node would crash and probably refuse to restart while there are I/O errors) but not durability.

Using a ramdisk for the databases is then actually quite feasible, at least for Linux systems that do not reboot frequently! I might investigate making this change on one of my smaller nodes and seeing if it has any impact on metrics.


I wonder if audits would be impacted by I/O errors on the database. If they do then a node could be suspended/disqualified needlessly. Audits IMO should not even touch the databases since audit traffic isn’t paid anyway. There’s no reason for audits to be waiting on databases or even attempting to read/write on the DB files, no?