Rsync never catches up, unable to migrate

atomsymbol · July 2, 2022, 7:57am

A major performance issue with using/recommending rsync to copy 4TB of data stored in 9.4 million Storj files on an HDD is that rsync isn’t reading the file-data in physical disc order (see: FS_IOC_FIEMAP and BTRFS_IOC_TREE_SEARCH_V2 ioctls). If rsync was able to read the file-data in disc order, then it would be able to achieve 100-200 MB/s read speeds, instead of the 5-10 MB/s it can currently achieve because of the many random I/O HDD accesses.

Another performance issue is that (as far as I know) rsync isn’t taking advantage of io_uring in Linux. A related issue is that to enable the Linux kernel to read 9.4 million files from an HDD more efficiently, rsync would have to have open at least 20000 files all the time (see RLIMIT_NOFILE and “ulimit -n”).

andrew2.hart · July 2, 2022, 3:45pm

A weird thing I’ve found is that rsync works better over the network than with local disks. Or maybe my expectations…