I’ve been trying to migrate a node to a new machine for a while, but the rsync takes so long, and when it finally finishes, data has changed so much that the next sync takes just as long. I can’t seem to catch up. It’s roughly 4TB over 1Gbit lan, no hardware failures detected, no error logged, no processes frozen.
Any tips? Has this been reported before? I couldn’t find anything about it.
I’m going to test using the --size-only flag suggested here to see if I can catch up.
Reduce your storage node’s allocated disk space to as little as possible, this way you have a chance of preventing ingress while doing rsync, leading to less changes to resynchronize.
It’s fine to just set it to 500GB for the duration of the migration, nothing will get lost.
Nope. It’s just a declaration of “I don’t want any more data for now”, nothing more. This setting is supposed to be used this way. I’ve got several nodes I want to downsize set up this way, and they slowly shrink as deletes come from the network.
it takes a long time to migrate… the first few rsync will take i think its like 2tb pr day due to limits of hdd iops and such…
so you should expect the first rsync to take 2 days, then the next one might take 1 day and then next one like 12 hour… and it just keeps improving… i think i usually end up running like 7-12 rsyncs before the time ends up at a level i consider acceptable.
10-15 maybe 30 minutes… really it depends more on when i have time than when the rsync is quick enough.
i just query them endlessly or something similar.
so yeah long story short just keep running rsync you should catch up…
else shut down the node… doesn’t really matter to much, i had like 3 days of downtime at the beginning of this month, didn’t seemed to matter at all… and you got 12 days runway until your node gets suspended and a suspension doesn’t mean much… just no ingress…
also set your max capacity lower than what you got then ingress won’t slow you down and rsync will catch up easier…
storagenode ingress is like ½Mbit avg max… if even that… closer to 1/4 most of the time or less.
so yeah you will catch up… just keep going…
I’ve migrated plenty of smaller nodes before, many times, when playing around with different hardware. But this migration has been going on for weeks. It does finish the sync rounds, but yeah, can’t ever catch up for some reason.
The potential problem I see here is that this speed is reachable when transfering big chunks of data. Storj pieces are millions of very small files, so I highly doubt the rsync process can use all of your bandwidth.
When possible, that’s actually a great idea! that would run the thing at maximum speed for sure.
Oh waw really? That seems like a lot for good disks like Exos
Weird… Something seems off indeed.
Rsync cannot run parallel threads natively. So basically this trys to run several independent rsync instances in parallel which would help to max out the existing bandwidth.
This could work, however result must be checked very carefully that really all source data have been copied.
You could probably even use --ignore-existing. Since blobs can’t change anyway. Just make sure you don’t have that option for the last run when your node is offline.
Thanks guys, I might try these methods later on, right now I’m running with --size-only flag after setting node STORAGE to less than the amount pf data stored. I’ll give that a day or two to see where I get. I’ll report back for the curious.
My own migration script uses rsync -avAXE --inplace --partial --del, but I haven’t used it for quite some time, so please verify whether all the flags make sense.
Well, the migration is finally completed. I still have no idea why one of the nodes was taking so long. Best guess is that the source disk or its interface had an issue I couldn’t catch. Besides that, everything went without a hitch.
Thanks everyone for all your suggestions and support!
Is it really required to remove that on the last run?
What I have found is that this option
forces rsync to skip any files which exist on the destination and have a modified time that is newer than the source file . (If an existing destination file has a modification time equal to the source file’s, it will be updated if the sizes are different.)
Wouldn’t different modification time or size be enough to ensure that only the current files exist in the destination?
Much more important to run the last rsync with --delete option when the source node is stopped, to remove temporary database files, otherwise databases would be more like corrupted in the destination.