Best Practice for Transferring Data To New Drive

arrogantrabbit · May 13, 2024, 3:39pm

It does not matter for how long it runs because your node stays online for the whole duration.

Disk clone requires downtime. If you are cloning 10TB disk your node will have to be offline for about 15 hours.

What’s the benefit?

It’s like saying “I’m not using my dishwasher because it takes 2 hours and I can wash the same dishes in 10 minutes” and missing the points that while it spends 2 hours — you don’t have to spend those 10 minutes.

There are better ways than rsync and disk cloning, if your filesystem supports snapshots. You then use the same approach as with rsync but instead of files you send a few incremental snapshots. The data is transferred at full speed.

What? Your filesystem does not support sending and receiving snapshots? Throw it away.

pangolin · May 13, 2024, 4:02pm

This is still faster than the final rsync run which requires the node to be shut down.

I guess you never did a node transfer…

lyoth · May 13, 2024, 4:02pm

If you are on Linux, use LVM, you can upgrade to new drives with little to no down time with pvmove

ACarneiro · May 13, 2024, 4:24pm

I’ve done a few and the final rsync never took 15 hours (they were 7TB nodes, though, so I guess larger ones might)

arrogantrabbit · May 13, 2024, 4:49pm

This means you need to run more rsync passes before shutting down the node. Downtime shall be few minutes at most.

Lol. Zfs: moving storagenode to another pool, fast, and with no downtime

And the same approach unattended: Received Node Disqualified Email But Node Still 100% Audit 100% Suspension - #23 by arrogantrabbit

pangolin · May 13, 2024, 4:56pm

Maybe if both disks are local and all metadata already cached in RAM… In real life I have seen from like half a day to several days for the last rsync run.

arrogantrabbit · May 13, 2024, 4:59pm

Why would not all metadata be in ram, if not only node was still running, but you have just completed previous rsync pass?

pangolin · May 13, 2024, 5:25pm

Honestly, I don’t know how much RAM is needed to cache tens of millions of files. Maybe 64 GB wasn’t enough.

Toyoo · May 13, 2024, 9:30pm

But does zfs support FALLOC_FL_COLLAPSE_RANGE?

Can confirm, did so several times already. Zero downtime migration at pretty much full sequential speed.

Potato hardware not having enough RAM to store all metadata.

ACarneiro · May 13, 2024, 9:47pm

That sounds very interesting. Are there any sort of instructions on how to do that?

Alexey · May 17, 2024, 6:18pm

ACarneiro · May 17, 2024, 6:47pm

That seems too much complexity for my simple mind…

Alexey · May 18, 2024, 1:17am

It’s not difficult, but requires to read a documentation.
I would emulate this on loop devices

Create images for the disks

$ truncate -s 1G disk1.raw
$ truncate -s 1G disk2.raw

Create loop devices to emulate disks

$ sudo losetup -f disk1.raw
$ sudo losetup -f disk2.raw

find which one loop devices has been created

$ losetup | grep disk
/dev/loop6         0      0         0  0 /home/ubuntu/disk1.raw                  0     512
/dev/loop7         0      0         0  0 /home/ubuntu/disk2.raw                  0     512

enter to an lvm shell

$ sudo lvm

Add a new disk

lvm> pvcreate /dev/loop6
  Physical volume "/dev/loop6" successfully created.

Create VG

lvm> vgcreate vg0 /dev/loop6
  Volume group "vg0" successfully created

Create LV

lvm> lvcreate -l 100%FREE vg0
  Logical volume "lvol0" created.

exit the lvm shell

lvm> exit

find our new lvm device

$ lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
fd0           2:0    1    4K  0 disk
loop0         7:0    0 63.9M  1 loop /snap/core20/2105
loop1         7:1    0 63.9M  1 loop /snap/core20/2182
loop2         7:2    0 67.8M  1 loop /snap/lxd/22753
loop4         7:4    0 40.4M  1 loop /snap/snapd/20671
loop5         7:5    0 91.9M  1 loop /snap/lxd/24061
loop6         7:6    0    1G  0 loop
└─vg0-lvol0 253:0    0 1020M  0 lvm
loop7         7:7    0    1G  0 loop
loop8         7:8    0 38.8M  1 loop /snap/snapd/21465
sda           8:0    0   20G  0 disk
├─sda1        8:1    0 19.9G  0 part /
├─sda14       8:14   0    4M  0 part
└─sda15       8:15   0  106M  0 part /boot/efi
sr0          11:0    1   52K  0 rom
sr1          11:1    1 1024M  0 rom

Format our new LVM volume

$ sudo mkfs -t ext4 /dev/mapper/vg0-lvol0
mke2fs 1.45.5 (07-Jan-2020)
Discarding device blocks: done
Creating filesystem with 261120 4k blocks and 65280 inodes
Filesystem UUID: f13cb690-c69c-43ae-90bc-36886da4b0b5
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

Mount a new lvm volume

$ sudo mkdir /mnt/storj
$ sudo mount /dev/mapper/vg0-lvol0 /mnt/storj
$ sudo chown $(id -u) /mnt/storj/

copy some file to there (to emulate data)

$ cp disk2.raw /mnt/storj/

make an md5sum file to check that file is not corrupted (this is not necessary, but allows to check that everything went well)

$ md5sum /mnt/storj/disk2.raw > /mnt/storj/disk2.raw.md5

checking:

$ md5sum -c /mnt/storj/disk2.raw.md5
/mnt/storj/disk2.raw: OK

Now we are going to add a new disk to move data.

$ sudo pvcreate /dev/loop7
  Physical volume "/dev/loop7" successfully created.

add it to vg0

$ sudo vgextend vg0 /dev/loop7
  Volume group "vg0" successfully extended

Mark the first disk to be not used anymore (any new data will go to the second disk)

$ sudo pvchange -xn /dev/loop6
  Physical volume "/dev/loop6" changed
  1 physical volume changed / 0 physical volumes not changed

now order to move data from the original disk to a new one

$ sudo pvmove /dev/loop6
  /dev/loop6: Moved: 32.55%

wait until it finish (if something happen on this stage, except disk damaging, it is safe, you always can resume the move even after reboot)

  /dev/loop6: Moved: 100.00%

remove the first disk from vg0

$ sudo vgreduce vg0 /dev/loop6
  Removed "/dev/loop6" from volume group "vg0"

remove pv for the first disk

$ sudo pvremove /dev/loop6
  Labels on physical volume "/dev/loop6" successfully wiped.

now check, that data didn’t damage

$ md5sum -c /mnt/storj/disk2.raw.md5
/mnt/storj/disk2.raw: OK

ACarneiro · May 18, 2024, 7:23am

Dude, that’s amazing!
Must have taken you ages to write that! Thank you so very much, I’ll have a read and play with it on one of my toy machines

Alexey · May 18, 2024, 8:24am

not at all. I did this once with my node. Before that it was on Windows and on NTFS, and I migrated data in place thanks to LVM without a damage.
However, my hardware has issues under Linux, so I migrated back (and lost some data because of random kernel panics…).
So, kernel panics will damage your NTFS data under Linux.
If you do not have hardware problems - it’s pretty safe.

The funny thing, that this hardware is working rock solid under Windows… If you do not upgrade to the latest Windows right away, but after 6-7 months. (or better - never… it’s too old).
If you interested, you can read the whole store here: