Thanks to the recent post about a node migration plan I took a look at my notes related to the option of parallelization of copy / rsync. In my notes I found three options.
1. The first option is about rsync with gnu parallel:
# for local use:
cd src-dir
find . -type d -o -type f |
parallel -j10 -X rsync -zR -Ha --relative ./{} ~/path/to/dest-dir/
# for ssh use:
cd src-dir
find . -type d -o -type f |
parallel -j10 -X rsync -zR -Ha --relative ./{} /{} fooserver:/dest-dir/
# Those flags [-zR -Ha] are just an example - it is just a very preliminary draft I briefly used for something else.
2 .The second option is related to parsyncfp and parsyncfp2.
http://moo.nac.uci.edu/~hjm/parsync/
I admit that I have never tried this software.
3 .The third option is related to Oracle File Storage Parallel Tools.
I have been using it for something else then Storj and I had experienced very positive results. However, the environment was specific and I do not have any hard results about the speedups apart to the feeling that it seemed to be visible and rather significant.
The follow up discussion, as I understand it, provided some not favorable opinion about the first option when used in standard local environment, it is when syncing files between two local drives or two directories on one local drive and was related to drive limits in terms of a seek latency.
To be honest, when I was preparing those notes, my use case was mostly related to transferring files over internet, not only on a local machine.
I am wondering if you have any experience with parallelization of copy / rsync locally or over internet and if you may provide any additional comments or even better measured results.
I believe this might be useful as not everybody is transferring “entire filesystems” and rsync as far as I know is the recommended method by Storj. Thus I am making it a separate topic with a hope for some substantive discussion.