How to move DB’s to SSD on Docker

How to move DB’s to SSD on Docker

Before you beginning, please make sure that your SSD has good endurance (MLC is preferred), I personally recommend using SSD mirror.

  1. look into the official documentation and make sure that you are using –mount type=bind parameter in your docker run string
  2. Prepare a folder with mounted SSD outside of <storage-dir> from the official documentation. (it your folder with pieces)
  3. Add a new mont string to your docker run string:

Now we have:

docker run -d --restart unless-stopped --stop-timeout 300
-p 28967:28967
-p 127.0.0.1:14002:14002
-e WALLET=“0xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”
-e EMAIL="user@example.com"
-e ADDRESS=“domain.ddns.net:28967
-e STORAGE=“2TB”
–mount type=bind,source=“”,destination=/app/identity
–mount type=bind,source=“”,destination=/app/config
–name storagenode storjlabs/storagenode:beta

should be:

docker run -d --restart unless-stopped --stop-timeout 300 \
    -p 28967:28967 \
    -p 127.0.0.1:14002:14002 \
    -e WALLET="0xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" \
    -e EMAIL="user@example.com" \
    -e ADDRESS="domain.ddns.net:28967" \
    -e STORAGE="2TB" \
    --mount type=bind,source="<identity-dir>",destination=/app/identity \
    --mount type=bind,source="<storage-dir>",destination=/app/config \
    --mount type=bind,source="<database-dir>",destination=/app/dbs \
    --name storagenode storjlabs/storagenode:beta
  1. Add/change a new parameter to your config.yaml
# directory to store databases. if empty, uses data path
# storage2.database-dir: ""
storage2.database-dir: "dbs"
  1. Stop and remove your storagenode container
    docker stop storagenode -t 300 && docker rm storagenode

  2. Copy all databases from “storage-dir\storage” to the new location “database-dir”. Do not move it! (if something goes wrong we just started with the database on old location instead of storagenode will recreate it)
    It’s recommended to copy with preserving permissions or you should reapply them after copy is done. The related Linux command is

cp -p *.db /destination/path
  1. Start your new docker run .... string

  2. Make sure that on database-dir you see files with .db-shm and .db-wal like on the screenshot

Summary

  1. If you can see files .db-shm and .db-wal on the new location “database-dir”, now you can delete database files from the old location “storage-dir\storage”.
32 Likes
Storagenode: allow configuring database path independently (v1.4.2 and above only)
Is the vacuum necessary?
CMR 12TB drive - 100% usage
Make running a quality node easy and viable
Weird node behaviour
Started getting download errors
Blank bandwidth and disk space hours
Avg disk space used dropped with 60-70%
Docker container with node restarting and offline
Log errors I am seeing
Node Suspended randomly>
Simple guide for moving databases to SSD(OS drive)
Failed to encode json:unsupported value
ERROR piecestore failed to add bandwidth usage after normal update
Dashboard doesn't load or loads slowly (both web and cli), bandwidthdb: database is locked all the time
Disk usage discrepancy?
Bandwidthdb: database is locked
Loosing score on audit
STORJ-Knoten im unRAID-Array betreiben
Possible problem with v1.60.3 and docker in Synology. Anyone else?
Quick growth of piece_expiration.db
Диск загружен при запуске stroj на docker
Node operating with remote drive
Bandwidth.db database has 5GB
PSA: Beware of HDD manufacturers submarining SMR technology in HDD's without any public mention
Failed to add bandwidth usage
Using more ram to reduce number of writes
storjWidget for iOS / Android
Input/Output Error on my Disk
Failed to add bandwidth usage / bandwidth: database is locked
Huge drop of Average disk space used
Database file path change or recreate
Manager closed: unexpected EOF
Hay alguna forma de optimizar las bases de datos sin correr peligro de perder algo?
Failed to add bandwith usage
High i/o wait, controller issue?
ERROR lazyfilewalker.gc-filewalker.subprocess failed to save progress in the database
Node Dashboard: Online Status and QUIC Functionality OK, but Charts, Suspension, and Audit Features Unavailable
Got alot of "Unexpected EOF" on my node
Node not retaining data
Database maintenance - Integrity check, Vacuum
Extremely high ingress + massive node fault
4 Storj Nodes kill my system performance
Disk usage discrepancy?
Loop in updating process
Corrupt node after power outage
Error file config/storage/info.db failed: database is locked
Disk space used issue
How to handle new hashstore with NFS (or SMB) mounted storage
CPU and RAM usage through the roof
SNO Flight Manual
Database is locked. What is the reason? What is the possible solution?
Database is locked. What is the reason? What is the possible solution?
Cloned HDD OS to SSD which contained DB files - what to look for in logs
Should I be nervous or will it get online
Node Restarting Frequently - Several Errors
So long and thanks for all the fish

I’ve been pondering making an instruction for the same reason. I kind of feel if you don’t know how to figure this one out by yourself, you probably shouldn’t so it. Additionally, while it helps with performance, it definitely adds an additional point of failure to your node. Because now HDD failure is not the only thing that can take your node down. The SSD failing can as well.

So I think if we are going to make an instruction available as a separate post, we should include these warnings and caveats. And probably dissuade people from doing this unless they know it’s absolutely necessary for their setup. I mean, I know what needs to be done and have done it for 2 nodes now. But… I was a little in a hurry on the second one and forgot to copy the db’s. Caught is quickly enough and merged them together again (don’t ask how… it’s a lot of manual work, just avoid this). It’s too easy to mess up and not easy to fix problems afterwards. The only reason I’m doing this is because I’m using devices that almost certainly won’t be able to keep up otherwise after they are vetted and getting the full load. Additionally, I use RAID on the db disk, so there is protection against drive failure to mitigate the risk of relying on more than 1 disk for 1 node.

2 Likes

Made the change myself yesterday and it was pretty easy to figure out after a small amount of trial and error (forgot a space in the config) but it does require some critical thinking and a base understanding beyond just copying commands. I think for a setting like this does need to have some sort of technical hurdle to implement due to the risks. Put it on my 3x replicated CEPH SSD pool so hopefully that affords enough protection in the event of taking a hardware hit.

As an update, whereas before I would see locking multiple times an hour, after moving it 24 hours ago no database locking errors, so I would say that is a success. Seeing some pretty heavy bursts of IO periodically on the SSD volume so that definitely explains why it was locked, the HDD pool was having a bit of a tough time with latency requirements for responding fast enough for that.

2 Likes

Are all of the databases actually critical for storagenode operation? I thought many of them were just for the dashboard.

If none of them are required (e.g. if you stop the node, delete them, then start the node and everything works fine even if some orders are lost) then it would be a point of failure with regards to uptime, but not durability. That is, if the SSD dies then the node gets I/O errors on the databases, but stopping the node and pointing the DB storage at an empty, working storage location would fix the problem.

If some of the databases are actually required for operation, I wonder if it could be possible in the future to move only the ephemeral/not-strictly-needed databases to different storage. Those databases could even be stored on a ramdisk.

From what I remember about @littleskunk’s post, your node can survive losing all dbs as long as you have all the data.

Nice. So then what I said is true – it’s a point of failure for uptime (the node would crash and probably refuse to restart while there are I/O errors) but not durability.

Using a ramdisk for the databases is then actually quite feasible, at least for Linux systems that do not reboot frequently! I might investigate making this change on one of my smaller nodes and seeing if it has any impact on metrics.


I wonder if audits would be impacted by I/O errors on the database. If they do then a node could be suspended/disqualified needlessly. Audits IMO should not even touch the databases since audit traffic isn’t paid anyway. There’s no reason for audits to be waiting on databases or even attempting to read/write on the DB files, no?

nice! :nerd_face:
work fine

1 Like

Thank you, this is really interesting.
Is the increase in performance worth the (small but real) risk of adding another point of failure?
Are we looking at 10% improvement? 20% 100% 2%

Thank you :slight_smile:

i haven’t tested this… but if your system isn’t affected by it lack of iops for db load… then it might actually decrease your overall performance, ofc this is very unlikely but because you spread the data out over more storage media you might affect internal bandwidth or whatever…

however if you are critically affected by lack of IOPS for the storagenode… then you might see 1000% improvement in what your node can keep up with… and it could be much better than this…

this is down to that when stuff like a conventional hdd or smr hdd gets behind it will add up latency into the second range… while a normal seek latency should be 6ms

so that alone could reduce your latency by a factor of 200, and then when the smr writes at it slowest you get like 700kb/s… so you could essentially end up waiting 1.2 sec to write out a few kb … it can slow your system to a crawl…

but if you don’t have that issue… and everything runs smooth… you most likely will feel a limited effect…

Thank you for your insight.
No SMR on any of my drives and I’m gradually changing them from 5900RPM to 7200RPM Exos ones, so hopefully that’ll help a fraction.

Might put in an SSD for the db when the nodes start getting fuller :slight_smile:

i run zfs with a slog ssd with sync=always on the storagenode dataset… so essentially my system is already writing db changes to an ssd, ofc it ends up on the spindles… but i also run dual raidz1 on 7200rpm drives so i get double the IOPS of somebody running a single 7200 pr node.

the ssd option is a great solution for somebody running many nodes located on many hdd’s on one system… ofc putting multiple databases on the same ssd… well thats a collective point of failure.
so have to be able to really depend on it, if one runs many nodes like that…

if i was to do that beyond 3-4 nodes i would run a mirrored ssd setup and keep it well monitored in case of failure, maybe backup the db to the individual node drive every hour… or so… ofc that in itself may defeat the point, but a write like that should be sequential and thus not take more than a sec or two out of an hour of full performance.

in the end all this stuff sort of becomes math… :smiley: but then again so does everything.

1 Like

If you’re not having actual issues, it’s definitely not worth it. If you’re having issues, wait until you have 1.6.3 to see if those issues remain as one of the major issues with slower HDD’s will be solved in that one.

The last thing you want to do is move all db’s of several nodes to one SSD. That would create a single point of failure and you could lose all your nodes in one go.

In short, don’t fix things that aren’t broken.

3 Likes

Fair point, I’d forgotten about the changes coming in 1.6.3

We just want to eke out every last little bit of performance out of our systems, I guess :slight_smile:

True that!
Well actually, some of us just want their system not to crash… :sweat_smile:

2 Likes

So I tried this now on one small node and seems to work so far but I was wondering about the directory of “orders”. I was wondering if I could move that too.

Yes, you can, if you already did the database move to the SSD you can use the same folder and create a subfolder for orders.

Just stop the storage node and move the folder with orders to the new location and change the path into config.yaml
storage2.orders.path: dbs/orders

3 Likes

oh thanks! somehow I didn’t see that option… just too many options in here :smiley:

2 Likes

It was added later, the config.yaml isn’t updated with new options automatically.

yes but I started with my latest node where the option was already there, just didn’t see it :smiley:
But then I retrofitted the older nodes with those options.
Not that I see any significant difference on my disks since… But wanted to try it anyway since my SSD is running in a mirror. But since there’s not much ingress at the moment, there’s not much to see anyway. Just wanted to take some load off my HDDs, especially as I wasn’t running the DBs in an efficient way due to them running in a zfs dataset with recordsize 512KB with all the storagenode pieces. So I was curious if a change was even visible without additional tools. But I guess the logging needs more iops than the DBs xD So for SMRs setting the log level to warn would probably make more of a difference :smiley:
But I’m getting off-topic now, sorry

Yeah, I haven’t bothered with it. It’s nothing like the db load. Though I have redirected my logs to the db location for my slow external HDD nodes.