Hi there.
I know there are many topics (and documentation) about moving a node to a new disk.
Here’s what I did so far :
I had a Windows node with 1 HDD, moved that HDD into a different machine and am now running that with Docker Desktop for Windows, works just fine after some messing about with the config and data structure as per the documentation.
I want to move the node to a larger HDD now which in of itself isn’t that hard to do. The data in the node is millions of tiny files though, which will take weeks to copy over. I can’t turn off the node for that long or it would be disqualified.
I tried using syncbackfree and that failed during the scanning source phase, at around 3.2Million files, due to lack of memory on the machine.
Now I’m using robocopy to copy the files over.
Initial plan was to turn off the node, copy the files and start the node again.
With these millions of files it will take literal weeks to copy the data though, so the current plan is to run a full robocopy now while the node is running. Then turn the node off and copy the new/changed files over using the same robocopy mirror command.
Is this the most efficient way to move a node from one HDD to another?
We often see people set their node to not accept any new uploads… then rsync it a couple times while it’s still running. That gets the majority of the millions of files copied… with no downtime. Then they turn off the node, run rsync one last time (with the delete option), then turn it on pointed to the new location. (And allow it to accept uploads again)
There’s nothing safe about using hashstore now. None of the recovery tools have been released for it yet: anyone testing it is one fairly-insignificant corruption away from losing their entire node.
When you clone the disks you are not copying small files. Entire partition gets streamed sequentially. The content of the filesystem is irrelevant — it’s just one big blob.
(On the hash store — I’m strongly against the whole idea, I prefer small files on a filesystem, but it’s not my project, so they can play with whatever they want — I just won’t be jumping ahead of proactively making things worse)
You’ve got the devs who want to add database features to the filesystem. And the devs who want to add filesystem features to the database. Both are convinced their way is faster. Neither recognize the problem is trying to mix two approaches that work fine all by themselves.
I tend to side with you: filesystems are robust and reliable. And making them faster by throwing hardware at them (like SSDs) is also well understood.
But… part of being a software developer is the excitement of trying to do something fancy
It’s smart. the one issue I’m having now wouldn’t be fixed by that though (sorry, I should have led with that) - the Windows user on my current setup has no access to the node’s files. I’m actually running docker Desktop as Administrator currently to keep the node running. That was the issue that prompted me to copy the files instead. Setting permissions on this amount of files also takes ages so I figured I might as well move the entire thing to a larger HDD at the same time.
to reiterate, as long as you’re okay keeping the same filesystem, then cloning the partition is a much faster way to go. (you can then expand the partition to fit).
Another thing I stumbled into… I had a smallish (2TB) node, and instead of rsyncing it directly to the new hard drive, I did it first to a SSD that had available space, and then from that SSD to the new drive. It seemed possibly faster. (took about a day for first sync to SSD, a several hours for the second sync, and then several hours but less than a day to move to the new drive).
Is your node in NTFS? Did you try running terminal as admin for robocopy?
FYI NTFS is generally not recommended for linux, but I suppose mounting ext4 from Docker Desktop on startup might be a challenge.
Personally I went from Windows node to a Hyper-V VM with Ubuntu with direct HDD access formatted to EXT4 from within Linux. Fast, flexible and reliable. I can also move the drives as-is to another linux computer or VM easily. I think it’s a better long term solution if you need to run in Windows.
The Docker Desktop for Windows will use a network filesystem, depending on the backend. If they use WSL2 it will be a network filesystem 9p, if they use Hyper-V, it will be a SMB/CIFS, so Linux wouldn’t access NTFS directly in that case.
Despite that the network filesystems are not supported, they may work pretty well locally (within the same PC), also both network filesystems working even with SQLite databases. However the disclaimer that they might (and likely is) working differently between different machines than locally.
If you use a Linux VM, then you can run it without a Docker Desktop at all - just use a normal docker installation. The disadvantage that you need to reserve RAM and CPUs to that VM, so less flexible. But you also solves the problem with autostart (you may enable to start VM automatically after reboot), so you do not need to login to have it running.
I personally solved this by using a Rancher Desktop instead of the Docker Desktop, because you can run it with a scheduler and a trigger “at start”, so my nodes are up and running automatically without need to login after reboot. The Docker Desktop was able to work this way too, but starting with some version they broke this ability and when it’s started from the scheduler it doesn’t work at all.
We recommend to use the version 2.1.0.5 for the Docker Desktop and the Hyper-V engine, it also works stable on macOS.
For WSL2 you may upgrade safely. However, I do not know, which version broke the ability to run it with the Windows Scheduler.
but your node is down for the whole time you are cloning the disk.
rsync multiple times allows you to have a minute of downsync, literally. I tested that, it works.
You’re not wrong: but cloning partitions is a sequential transfer: so the drive will sustain hundreds of MB/s. You can probably clone any sized drive in less than a day. I’d rather eat a day of downtime and have the transfer done and over with… than to manage multiple rsyncs over a week and have it eat my attention.
Rsync can help you avoid downtime… if you don’t have a much faster option that makes downtime inconsequential.
it definitely did not take me a week to sync an 8TB volume
and there is nothing to do, just run rsync in a loop a few times. Not sure how you use your computers that running rsync eats your attention. Are you running an RFC2549 - compliant network ?
also, you can start by taking a snapshot (on ZFS) which is instant, then send that which is also max bandwidth using, then rsync for the leftover differences.
I see it differently. Several rsync passes in the script take up none of my time or attention. Then it would email me when it’s done. Now I have a few days to come back and press enter in the script, it would stop the node and run last rsync pass. And then i would swap the disk.
With downtime and cloning you have to pretty much babysit the whole process. It completes faster but takes more of your time (connecting disks (maybe you don’t have extra port on the source server), running cloning software, reconfiguring disks later), it’s time sensitive — you have to be around by the time cloning finishes, so it interferes with your other plans by creating these two connected points in time, and results in down time. It’s worse in every respect.
It’s ok if cloning can finish in four hours and rsync takes five weeks. Because you are not involved in those five weeks at all.
Heh. In ~/MyPetProjects — maybe. Not in production.
At some point, writing ultra reliable, ultra boring code becomes more exciting than jumping on every shiny new thing. Because stability for years without regressions is way more exciting that any possible fancy new features. New features are (often un-) necessary evil — they destabilize everything by necessity.
You can look at duplicacy and kopia backup programs as illustration. Duplicacy is barely evolves, it’s simple, and solid like a glacier. Kopia exists solely to tickle its developer ego. It shall be in Wikipedia article on feature creep. And it’s unstable, corrupts datastore, and is in permanent pre-alpha for years. Cool features but nobody wants to use it.
Actually we have some customers who uses Kopia and it doesn’t work well for them, however, they stick with it for some reason (of course I suggested them a better alternatives like the same Duplicacy).