Quickest way to move multiple nodes to new disks

I think you should go offline with your nodes and then copy them in one rush.

My experiences with Rsync is very very mixed. My view is that it is not well suited for this kind of task due to non existing parallism, zero knowledge and not suitable for huge number of small files.
But sometimes it surprises me.
If your destination is completely empty, it could be that Rsync is really fast. But don’t keep the node running, the continuous subsequent runs to check what has changed while it has been running could eat up your initial speed advantages.

You can try to run several instances of Rsync in parallel. Meaning you run one for each satellite folder. So that would give you 4 for the blobs and 4 for the trash. But then you need to be careful that the correct files land in the correct folders. A final Rsync run over the full copy is mandatory then.

I played around with something different. Something like:

{ time -p (nohup sh -c 'tar cvf - --sort=name --ignore-failed-read -C /path/to/source/storagenode . | pv | xargs -n 1 -P 32 $(tar xvf - --keep-newer-files -C /path/to/destination/storagenode/)' > /root/nohup_tar_.out 2>&1) >> /root/nohup_tar_.out 2>&1 & }

Where the xargs -P settings would allow to set the number of parallel tasks.
It had some hickups, but did work. I had it run when rsync took ages for a single satellite folder and I have used that script to copy the other satellite folders while Rsync was busy with that single one.

Also at the end, a final Rsync run would be mandatory to catch all the things that might have been missed.

2 Likes