Failing to copy files to another disk. Need advice

crossed the 10% mark, and we’re down to 4.75MB/s

yes, correct. I kind of assumed this was ZFS to ZFS

ZFS works better with multiple drives

1 Like

There will not be a 8x network load. All nodes behind the same /24 subnet of public IPs are treated as a one big node for customers’ uploads, but as a separate ones for downloads, audits, repairs and online checks: we want to be decentralized as much as possible.
This would allow also to avoid RAID, because it may fail as well:

But in case of even 2 HDDs failures, you could lost only 2/8 of the common data unlike the case with the failed RAID and one big node where you will lose all.

This doesn’t make sense, if you are not breaking rules, then all of them are behind the same /24 subnet of public IPs, so they would have a traffic as a one single node. It’s also against recommendations (and ToS in particular) to do not run more than a one node on the same storage.

1 Like

I know. It also can autocorrect errors in that case, unlike the single disk ZFS.
But the single disk per node is much more efficient usage of the storage. Of course you may use RAID if you already have it, but I wouldn’t recommend to setup it only for Storj.
The redundancy is integrated into the Storj protocol, so no need to have another one in your setup. You may spread the load just running several nodes, each on own disk and it would also help to survive almost any number of disk failures. Each separate failed node with its data would be lost of course, but it will be only n/m of a common data, where m - the total amount of nodes, and n - the number of failed nodes.
With a 2 disk failures your RAIDZ will die as a node on it. The total loss would be 1/1.

Not sure why you’d go straight to assuming I’d be breaking rules; they’re all on the same IP, and always have been, even when I previously had them pointed at separate individual drives on the same server. I needed the drive slots to change the layout of my main zpool and it never made sense to add more drives back to the server just for storj. I could graceful exit some of them but I also don’t see any reason to.

So, it was not intentional? Then I’m sorry, I didn’t know that.
Using the same storage for all nodes is not an optimal setup, they would affect each other, at least if they use the same dataset.
If there would be different datasets, they maybe would not affect each other, but I think your pool is under a heavy load anyway, especially when they would run a used-space-filewalkers, TTL collector, trash chore or the garbage collector in the same time.
You may reduce the allocation (or the quotas) on all nodes except one, and it would take all the traffic, while limited ones would be slowly drained. So you can finish with a one node and reduce the load on your system.

I also had more luck with rclone than rsync. I mean… maybe 50% faster. Still took weeks.

2 Likes

OK, for those who is interested - final report :slight_smile:
PS - final resync took another 11 hours

3 Likes