Node Migration - rclone

Chris21788 · June 3, 2024, 9:52pm

Hello all,

Long time Storj Node Operator here. I’m in a bit of a pickle. So with the deletion of test data recently, has gotten me to a point where I have about 6TB on a single node. I’m working on migrating that data to a new disk (16TB), but I’m finding that my migration just can’t keep up at this point.

I’m running the following command on my node:

rclone copy --transfers=24 --checkers=16 --ignore-checksum --progress --order-by modtime,descending /mnt/disk6/storj/Data/storage/blobs/ /mnt/disks/StorjNode1/Data/storage/blobs/ --exclude=/Data/storage/trash/** --exclude=/Data/storage/garbage/** --exclude=/Data/orders/archive/**

I’ve tweaked these commands a ton and found that this provided the best performance for me. I’m using rclone to handle parallelization, which I found to be better than rsync w/ the parallel package.

I’m finding that the overall activity of my node, along with the migration (rclone) commands is putting stress on my system and things just slow down. My iowait on the system is roughly 40-60% at this point, but I’m following these steps:

Migrate (rclone) data from original disk to new disk (while node is running).
Stop Node
rclone again to ensure all data is copied.
Update Config and point to new disk.

At this point, the high I/O is causing my online score to drop (92-93%), but everything else is fine. What level of impact do audits have on my “suspension”? Is there any point where I just say F*** it, and stop the rclone and point to the new disk?

I almost wish there was a way to set my node in a maintenance mode (to not accept ingress, but allow egress), so that I/O is reduced, and no new data is added.

Roxor · June 3, 2024, 10:00pm

Did you set your max stored size to less that what you’re currently holding? (like set it to 1TB or something). You don’t want your node participating in new-data-performance-testing while also copying in the background.

ACarneiro · June 3, 2024, 10:02pm

Hi, Chris. If you’re struggling so much you could just take your node offline and copy the data across. Might be faster that way?
You node won’t be suspended until the online score reaches 60% and that should take about 10 days (ish, I think it could be a bit more than that).

Is your current node full, then? If you still have a couple of TBs free, perhaps it would be easier just to start a new node on a larger HDD and let your current one fill up to capacity…

I would suggest a search on the forum as there there are multiple suggestions from people far more knowledgeable than me for node migration strategies.

Good luck!

Chris21788 · June 3, 2024, 10:05pm

That wouldn’t cause data to be offloaded my node, would it? I’m guessing I could set it to 6TB (or 5.67 TB, as 5.68 TB is used) to prevent too much of a loss.

Roxor · June 3, 2024, 10:07pm

Your node doesn’t start offloading data. It will just not take any new ingress… and it will allow itself to slowly naturally shrink as data-gets-deleted/trash-taken-out over time. Maybe you lose a tiny bit: but it will be worth it to complete your migration quickly.

Many people set it to 1TB (or any number less than you have now) and let a couple rsyncs run. Then turn the node off and do one-more-final-rsync. And then start the migrated node up and allow it to use the new space. You can do the same thing with rclone.

Alexey · June 4, 2024, 2:23am

By the way, the final sync must be with delete in the destination of what is deleted in the source.
For rclone sync it should be default. So you need to use rclone sync instead of rclone copy in the final run.
If you wouldn’t do - your destination may have an outdated database logs, and when you start the node in a new place, they could corrupt the database.

Morcin42 · June 4, 2024, 7:06am

You could take a look at This

I’ve been migrating 4 nodes to new hard drives. I’ve tested multiple methods, but not rclone yet. Currently waiting for the biggest node to finish te sync, to start the last two. Only 700gb to go

I’ve taken the nodes offline at the start of the sync. This speeds up the transfer, and doesn’t require you to “keep up” as you say. Nodes can be down for ~12 days before they get suspended. After that, it takes even more time than that to get disqualified. My biggest node has 7TB of used data, and the move so far is taking 8 days. I expect it to be done tomorrow by the end of the day, well within the margin of 12 days.

Chris21788 · June 4, 2024, 12:42pm

Morcin, I recommend you do check out rclone. It’s parallelized transfer, which I found to be much quicker than rsync. You’ll just need to tweak your transfers and checkers count, for your system.

I just transferred just less than 7TB in about 3 days, mostly with the node online… Note my system did go down a few times because of the iowait, but I’m happy with the performance.

For everyone, I host my node on Unraid. I leverage their shares that allow multiple disks under it to keep expanding (if needed). This was my biggest bottleneck. Because of the overhead of having it under a share, my transfer performance was limited by the OS. When referencing the disks directly (/mnt/disk6/…) I removed that overhead, and transfers went ++++!

Chris21788 · June 4, 2024, 12:43pm

Thanks for the reminder @Alexey . Just finished my transfer, moved my database from a share to a local NVME, and I’m off again. Time will tell if my audit drops at all, but I think I’m out of the woods.

Thanks everyone!

Morcin42 · June 4, 2024, 7:47pm

Thanks, I’ll try rclone for the other two nodes.
As for your setup with unraid, just some advice, i’ve been down that road. If you are using your unraid server for anything other than storj, don’t put the nodes on the array. Your performance will degrade over time the bigger your node gets. Also, the unraid parity checks do not play nice with the IO the nodes generate. It will cause the parity checks to take days, all the while your nodes suffer and lose uptime.

I started my own nodes on unraid that way. Later, as they grew, I moved my nodes over to separate disks using the unassigned devices plugin and then, as soon as unraid started supporting that, single disk pools. And I’m now at the point of moving my nodes away from Unraid entirely to move to a dedicated server. (though that is not for performance reasons, the single disk pools actually perform great)

Chrysen · February 7, 2025, 8:18pm

Is it ok to restart the copied node without trash garbage and archive?

EasyRhino · February 7, 2025, 9:32pm

for the trash one you should keep it back 7 days, even trash can get audited.

garbage and archive really shouldn’t have much at all data in them and shouldn’t matter.

Chrysen · February 13, 2025, 4:59am

I am currently copying a node to a new HDD with rclone. I run it a few times with copy, then stopped the node and ran it again with sync.

Copying was quite fast with sync, but deleting takes a long time. Can you run the node when the sync is running?

Of course you have to do the last sync when it is off so that the last changes are on the new HDD.

EasyRhino · February 13, 2025, 5:25pm

you can run it in the “old” location while the sync is still running, yes.

And yes it takes forever

and yes deleting is also surprisingly slow.