How to move DB’s to SSD on Docker
Before you beginning, please make sure that your SSD has good endurance (MLC is preferred), I personally recommend using SSD mirror.
- look into the official documentation and make sure that you are using –mount type=bind parameter in your
docker run string
- Prepare a folder with mounted SSD outside of
<storage-dir> from the official documentation. (it your folder with pieces)
- Add a new mont string to your
docker run string:
Now we have:
docker run -d --restart unless-stopped --stop-timeout 300
–name storagenode storjlabs/storagenode:beta
docker run -d --restart unless-stopped --stop-timeout 300 \
-p 28967:28967 \
-p 127.0.0.1:14002:14002 \
-e WALLET="0xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" \
-e EMAIL="firstname.lastname@example.org" \
-e ADDRESS="domain.ddns.net:28967" \
-e STORAGE="2TB" \
--mount type=bind,source="<identity-dir>",destination=/app/identity \
--mount type=bind,source="<storage-dir>",destination=/app/config \
--mount type=bind,source="<database-dir>",destination=/app/dbs \
--name storagenode storjlabs/storagenode:beta
- Add/change a new parameter to your config.yaml
# directory to store databases. if empty, uses data path
# storage2.database-dir: ""
Stop and remove your storagenode container
docker stop storagenode -t 300 && docker rm storagenode
Copy all databases from “storage-dir\storage” to the new location “database-dir”. Do not move it! (if something goes wrong we just started with the database on old location instead of storagenode will recreate it)
Start your new
docker run .... string
Make sure that on database-dir you see files with .db-shm and .db-wal like on the screenshot
- If you can see files .db-shm and .db-wal on the new location “database-dir”, now you can delete database files from the old location “storage-dir\storage”.
I’ve been pondering making an instruction for the same reason. I kind of feel if you don’t know how to figure this one out by yourself, you probably shouldn’t so it. Additionally, while it helps with performance, it definitely adds an additional point of failure to your node. Because now HDD failure is not the only thing that can take your node down. The SSD failing can as well.
So I think if we are going to make an instruction available as a separate post, we should include these warnings and caveats. And probably dissuade people from doing this unless they know it’s absolutely necessary for their setup. I mean, I know what needs to be done and have done it for 2 nodes now. But… I was a little in a hurry on the second one and forgot to copy the db’s. Caught is quickly enough and merged them together again (don’t ask how… it’s a lot of manual work, just avoid this). It’s too easy to mess up and not easy to fix problems afterwards. The only reason I’m doing this is because I’m using devices that almost certainly won’t be able to keep up otherwise after they are vetted and getting the full load. Additionally, I use RAID on the db disk, so there is protection against drive failure to mitigate the risk of relying on more than 1 disk for 1 node.
Made the change myself yesterday and it was pretty easy to figure out after a small amount of trial and error (forgot a space in the config) but it does require some critical thinking and a base understanding beyond just copying commands. I think for a setting like this does need to have some sort of technical hurdle to implement due to the risks. Put it on my 3x replicated CEPH SSD pool so hopefully that affords enough protection in the event of taking a hardware hit.
As an update, whereas before I would see locking multiple times an hour, after moving it 24 hours ago no database locking errors, so I would say that is a success. Seeing some pretty heavy bursts of IO periodically on the SSD volume so that definitely explains why it was locked, the HDD pool was having a bit of a tough time with latency requirements for responding fast enough for that.
Are all of the databases actually critical for storagenode operation? I thought many of them were just for the dashboard.
If none of them are required (e.g. if you stop the node, delete them, then start the node and everything works fine even if some orders are lost) then it would be a point of failure with regards to uptime, but not durability. That is, if the SSD dies then the node gets I/O errors on the databases, but stopping the node and pointing the DB storage at an empty, working storage location would fix the problem.
If some of the databases are actually required for operation, I wonder if it could be possible in the future to move only the ephemeral/not-strictly-needed databases to different storage. Those databases could even be stored on a ramdisk.
From what I remember about @littleskunk’s post, your node can survive losing all dbs as long as you have all the data.
Nice. So then what I said is true – it’s a point of failure for uptime (the node would crash and probably refuse to restart while there are I/O errors) but not durability.
Using a ramdisk for the databases is then actually quite feasible, at least for Linux systems that do not reboot frequently! I might investigate making this change on one of my smaller nodes and seeing if it has any impact on metrics.
I wonder if audits would be impacted by I/O errors on the database. If they do then a node could be suspended/disqualified needlessly. Audits IMO should not even touch the databases since audit traffic isn’t paid anyway. There’s no reason for audits to be waiting on databases or even attempting to read/write on the DB files, no?
Thank you, this is really interesting.
Is the increase in performance worth the (small but real) risk of adding another point of failure?
Are we looking at 10% improvement? 20% 100% 2%
i haven’t tested this… but if your system isn’t affected by it lack of iops for db load… then it might actually decrease your overall performance, ofc this is very unlikely but because you spread the data out over more storage media you might affect internal bandwidth or whatever…
however if you are critically affected by lack of IOPS for the storagenode… then you might see 1000% improvement in what your node can keep up with… and it could be much better than this…
this is down to that when stuff like a conventional hdd or smr hdd gets behind it will add up latency into the second range… while a normal seek latency should be 6ms
so that alone could reduce your latency by a factor of 200, and then when the smr writes at it slowest you get like 700kb/s… so you could essentially end up waiting 1.2 sec to write out a few kb … it can slow your system to a crawl…
but if you don’t have that issue… and everything runs smooth… you most likely will feel a limited effect…
Thank you for your insight.
No SMR on any of my drives and I’m gradually changing them from 5900RPM to 7200RPM Exos ones, so hopefully that’ll help a fraction.
Might put in an SSD for the db when the nodes start getting fuller
i run zfs with a slog ssd with sync=always on the storagenode dataset… so essentially my system is already writing db changes to an ssd, ofc it ends up on the spindles… but i also run dual raidz1 on 7200rpm drives so i get double the IOPS of somebody running a single 7200 pr node.
the ssd option is a great solution for somebody running many nodes located on many hdd’s on one system… ofc putting multiple databases on the same ssd… well thats a collective point of failure.
so have to be able to really depend on it, if one runs many nodes like that…
if i was to do that beyond 3-4 nodes i would run a mirrored ssd setup and keep it well monitored in case of failure, maybe backup the db to the individual node drive every hour… or so… ofc that in itself may defeat the point, but a write like that should be sequential and thus not take more than a sec or two out of an hour of full performance.
in the end all this stuff sort of becomes math… but then again so does everything.
If you’re not having actual issues, it’s definitely not worth it. If you’re having issues, wait until you have 1.6.3 to see if those issues remain as one of the major issues with slower HDD’s will be solved in that one.
The last thing you want to do is move all db’s of several nodes to one SSD. That would create a single point of failure and you could lose all your nodes in one go.
In short, don’t fix things that aren’t broken.
Fair point, I’d forgotten about the changes coming in 1.6.3
We just want to eke out every last little bit of performance out of our systems, I guess
Well actually, some of us just want their system not to crash…
So I tried this now on one small node and seems to work so far but I was wondering about the directory of “orders”. I was wondering if I could move that too.
Yes, you can, if you already did the database move to the SSD you can use the same folder and create a subfolder for orders.
Just stop the storage node and move the folder with orders to the new location and change the path into config.yaml
oh thanks! somehow I didn’t see that option… just too many options in here
It was added later, the config.yaml isn’t updated with new options automatically.
yes but I started with my latest node where the option was already there, just didn’t see it
But then I retrofitted the older nodes with those options.
Not that I see any significant difference on my disks since… But wanted to try it anyway since my SSD is running in a mirror. But since there’s not much ingress at the moment, there’s not much to see anyway. Just wanted to take some load off my HDDs, especially as I wasn’t running the DBs in an efficient way due to them running in a zfs dataset with recordsize 512KB with all the storagenode pieces. So I was curious if a change was even visible without additional tools. But I guess the logging needs more iops than the DBs xD So for SMRs setting the log level to warn would probably make more of a difference
But I’m getting off-topic now, sorry
Yeah, I haven’t bothered with it. It’s nothing like the db load. Though I have redirected my logs to the db location for my slow external HDD nodes.