Hi, i am trying to migrate from windows to UBuntu , i have 8TB HDD full drive, and trying to copy this to ZFS 40TB array. However , due to number of files and slow HDD rsync just stops after 4 days . Transfer speed showed in satus is about 1MB/s , ths would take months to copy… any thoughts how to copy it faster ?
PS - i tried using --no-inc-recursive , but this hanged the process for 20hours, and never started transfer . if starting stadard it drops to 580KB/s or less when 1TB copied
You need to figure out what’s bottleneck there - either a source or a destination or both.
But I guess the bottleneck here might be the ZFS array, usually a parity array runs at the speed of the slowest disk in the array. For 40TB array having only 24GB of RAM sounds like insufficient, the common recommendation is to have at least a 1GB of RAM per TB on ZFS.
You can speedup a process, if you would set the allocation below the usage showed on the dashboard, it will stop any new ingress, so less data migration.
You may also try to use rclone instead of rsync.
Well, " ARC memory mainly impacts read performance in ZFS, not writes. ARC (Adaptive Replacement Cache) is a read cache, so it stores frequently accessed data to speed up subsequent read operations. However, writes are handled through a separate mechanism called the ZFS Intent Log (ZIL), which can use a dedicated device (like an SSD) for improved write performance." So in my situation RAM is really not an issue , also i have cloned my HDD with DD , and now copying from HDD which is not used for real time STORJ at all. So there is basically no other load on IO and it still sucks
And thanks for advise- i will try rclone
You didn’t mention that you use an SSD special device.
Did you mount NTFS under Linux? If so - this is the reason. The FUSE implementation of NTFS is not performant.
Everyone claims that ZFS can be fast, however, my tests didn’t confirm that. With millions of files in my tests it was slowest FS from: ext4 (), BTRFS (), ZFS ().
A 10 TB storj node copy between different file systems will always take weeks. So it doesn’t matter much if you chose zfs or ext4.
However, I would recommend to use single HDDs for storj nodes. Using any kind of disk array for storj only makes sense if it already exists for some other reason.
You can use ZFS, but you need to use an SSD special device and/or 1GB RAM/1TB storage.
However, I’m agree with @alpharabbit, using a single disk for node is much more efficient and less IOPS bound (any RAID will consume IOPS for itself).
And yes, copying millions files would take a lot of time. However, you may try to use rclone approach. If you would do, please report how is it better.
I suggest to use rclone sync, because it also would delete files in the destination which were deleted in the source.
@alpharabbit , i do have 8x6TB drives, so there is no way i can use them as singes, unless i start 8 nodes on one pc , which is also not something i really want doing :). Plus i had one of my node disqualified last year when my hdd died . So raid is not that bad if you think about it as a single node + failsafe. Plus having 8 drives actually makes it quite fast all together (tho dont know how about when it will be filled with billion of files )
@alpharabbit dunno, but like 8x network load (I am on home internet , which is quite important factor as well) , 8x miantenance (as you have to look after each anyway time to time). And i am not sure if this actually brings better outcome in terms of earnings. And of course i have 1 drive failsafe with raid , which is also good. Do you think i should consider frangmenting it anyway ? I would be glad to hear other opinions, as i can still pick the way i go
@Julio - thanks for suggestion. I found that windows + FastCopy makes it up to 7.8MB/s. I will leave it there for a day to see how it goes, as rsync went down in speed during 4th day
Excellento! Glad that’s working better for you.
An upside to your patience: your existing data will be relatively, if not entirely, freshly defragmented.
1GB per TB is a very outdated recommendation and was based around providing enough RAM to run deduplication. Since deduplication would never work on storj (and indeed, most workloads) it’s always better to just buy more disk instead of the absurd amount of RAM needed to make ZFS deduplication “work”.
I’m running multiple nodes on a single ZFS array that’s also hosting other workloads, with no special devices / ZIL, with no problems, and I’m probably in the range of 0.3GB RAM per TB after accounting for the RAM used by those workloads.
using the windows built-in tar.exe to zip each subfolder (aa,ab) in a script.
copy them over SMB and extract in batches.
robocopy over SMB while node is offline for the final sync
Was painful but it allowed to pause/resume (to let random StorJ GC jobs finish for instance) without restarting the whole listing. Tried multi-threaded copy but wasn’t much faster and hammering the disk was causing online score to drop.