Migrating to another drive

jordan30001 · March 29, 2023, 8:11pm

One of my nodes hard drives currently has a lot of bad sectors, and it looks like the drive is dying.
long story short the drive is really fragmented, so it was estimating over a week to transfer which in windows time would probably be even longer, so I figured the node would be dq’ed before it transfers.
I thought that I could write a small program that transfers the small directories one at a time and then put a NTFS junction link in it’s place.
I noticed that my audit score basically dropped 4% in a few hours.

and in the log file, there were lots of errors stating that storj couldn’t read/write files to these NTFS junction files because they were on another hard drive which seems really stupid that it can’t do this.

so my question is, how do you normally migrate a drive if you need to when the storj is really big e.g. 10TB+.
And it’s transferred about 5TB so far so even if I stop my node now to move the rest, it’s probably still going to DQ me due to how long it’s going to take to transfer, so I guess I just bite the bullet and kill the node without a graceful exit?
it’s really unfortunate that it can’t read/write through NTFS junction links, since the Windows filesystem API’s can handle move/copy commands regardless of whether there is a NTFS junction link in place, as the entire reason they are there is for scenarios like this.

lyoth · March 29, 2023, 8:42pm

I believe you can have 12 days of offline before you get DQ.
Will the other 5TB take longer than 12 days?

jordan30001 · March 29, 2023, 8:52pm

with the amount of fragmentation the drive has, windows estimates 7 days to transfer it all, but my experience with windows estimated transfer time is a complete lie, and will usually take double the amount of time it says.

snorkel · March 29, 2023, 10:12pm

I believe it’s 12 days for suspension and 30 days for DQ, so 42 days in total to recover.

bre · March 29, 2023, 11:28pm

Your node can be offline for 288 hours before it would get suspended

The disqualification for downtime is implemented only for long downtimes (more than 30 days offline.)

jammerdan · March 30, 2023, 2:47am

If your audit score is getting bad so quickly you might want to stop the node and transfer the files while it is not running.
On Windows you should use Robocopy with multithreading:

If you have bad sectors you should look at Robocopy retry and wait switches and adjust the default settings: robocopy | Microsoft Learn

Alexey · March 30, 2023, 3:53am

junction links doesn’t work properly with storagenode’ data, do not use them

You may reduce the allocation below the usage, save the config and restart the node, then your node will stop accept a new data.
After that run

robocopy /MIR /mt:8 D:\storagenode E:\storagenode

Run it several times, until the difference would be negligible, then stop and disable the storagenode service from the elevated PowerShell:

Set-Service storagenode -StartupType Disabled
Stop-Service storagenode

and run the robocopy command above one more time.
After that modify your config.yaml to use a new path for the data, save it and start the node:

Set-Service storagenode -StartupType Automatic
Start-Service storagenode

The other way is to do the same, but with initially stopped storagenode service. However in this case it will have a downtime.

Toyoo · March 30, 2023, 8:38pm

Usually the fastest ways to copy whole disks is not to try copying them file by file, instead copying the whole disk images. There are many tools for that, for example the open source ntfsclone. They are faster because they don’t attempt to locate each file separately, every single one in a different location, instead they just scan the whole file system to look which blocks are used by the file system, then copy the blocks sequentially. Sequental reads are fast. You could expect the whole copy to proceed at speeds like 100 MB/s or more, which should be good enough to make it in a single day or two with the node turned off and not risk disqualification.

However, the complicating factor are the bad blocks. I don’t know which of those tools will cope with them in a satisfactory way.

Vadim · March 30, 2023, 9:16pm

any link for windows? it looks like only for linux

Alexey · March 31, 2023, 4:12am

You may use a generic partition cloning program as well, like Partition Magic or other similar tools.
You may also use Ubuntu Live USB stick to boot into Linux to use ntfsclone.

Toyoo · March 31, 2023, 7:42am

Sorry, I don’t know much about Windows tooling.

jordan30001 · April 4, 2023, 7:45pm

I shut down the node as soon as I posted, but it looks like I got DQ’ed anyway as enough audits failed. well lesson learnt for next time, don’t use NTFS folder junction links to try keep the node running while transferring.

Alexey · April 6, 2023, 4:00am

Sorry to read that. Yes, junction links rarely works like expected, too many conditions. For example - the SYSTEM user could not access it because of permissions issue, or just not supported if they points to another disk (seems your case).
I’m not sure that symlinks in Linux would work too.