Migrating a node

Hi!

The following applies mainly to a synology NAS holding a node (or more). The issue is too much CPU IO waiting leading to unresponsive NAS and too many storj lost races.
Last time I checked in, someone from storj was trying to help me with my setup, but I didn’t think we were heading somewhere. Regardless, my apologies for not following down that road.
Meanwhile, I also tried to get help from synology, but that was a mistake. Turns out I didn’t spend my life savings buying their stupid memory RAM modules (rebranded from ADATA), therefore, any problem I might experience is due to “unofficial memory”. They said I should change my RAM modules, I said I would change everything around my RAM modules but would always keep those.

I installed a new disk, formatted in EXT4 (yes, I do think one shouldn’t use BTRFS for storj, whether you use “record file access time” and “file self healing” or not) and started the process of moving my 6.3TB node to it.
The first rsync took 16 days (large single node instead of small multiple nodes, anyone?). I’ve started the second rsync today and estimate it will take 2-3 days. The problem is that there’s nothing to gain doing a second rsync. I should have just stopped the node before starting the 2nd rsync and make it a final rsync. It takes a negligible amount of time to copy the new files from each directory, but it takes ~30 seconds to start copying each directory (figuring out what to copy, I guess…). 30 seconds, times 32 directories, times 32 directories in each original directory, times 6 satellites, you end up with more than 2 days. Since the actual “copy time” is negligible, you will always end up with 2 days for the last rsync.
This is information I would like to have had before I started moving my node. I hope it’s useful for somebody.

Other than that, the storj doc “How do I migrate my node to a new device? - Storj Node Operator Docs” should be corrected. It explicitly states that the subdirectory “/storage/” was left out of the mount command because the docker container will add this subdirectory. But the rsync command “rsync -aP /mnt/storj/storagenode/storage/ /mnt/storj2/storagenode-new/storage/” will add another /storage subfolder. The data will end up in “/whatever/storage/storage/”. To cope with the rsync command, maybe the mount command should contain the “/storage” subfolder…

I think the rsync command is ok, but it is very sensitive to the final / of the first path argument. If omitted, it will transfer the “storage” directory itself, ie creating the duplicate path entry. Else, it will transfer the contents of the “storage” directory (desired outcome).

2 Likes

I’m moving a node off of BTRFS as well for the same reason. The first Rsync indeed takes a massive amount of time (weeks, not days). I don’t know how I’d ever be able to migrate a 20TB node… the initial sync would take approximately 2 months at least.

It would be awesome if you could set the node to a ‘maintenance’ mode for example that would only allow egress & deletes for a while, thus cutting disk I/O.

Or maybe I should just spin up a new node on the same machine (different disk), that would start auditing on the new node while also cutting disk I/O on the main node because of the shared ip.

IMHO;
Multiple small nodes don’t make sense on most NAS systems. If you can spare a disk and NAS bay for Storj, you also know how expensive/valuable those remaining bays are for your own storage.

I moved Storj off of my main volume to a separate disk because of the increased I/O on the whole Array. And although I would definitely dedicate a separate 4-bay NAS to Storj if it was possible, I also know it wouldn’t make financial sense to do so.

You can do that by setting the parameter -e STORAGE="XXTB" in your docker run command where XX is a value smaller than the current space USED in your node. Say for example in your case if you node currently has 20 TB of USED space you could set it to 10 TB.

6 Likes

You may do a first copy using the cp command as well, but next should be rsync and it will take care of missed or changed files.
As already suggested, you may also specify the allocated size less than you have used space, this will prevent your node from getting any new ingress.

And yes, both commands either cp or rsync treating paths differently, when they do not exist or already exist, it’s also important to have trail / to copy a content, not the folder. If you want to copy a folder, you need to remove the trailing / from the source path and should use the parent path on the destination, then folder will be created in the parent location (works only if the parent path on destination is exist, otherwise it will create a provided destination and copy a content to there).
However, the second use of cp -r storage/ storage2/ will copy storage folder to the storage2 folder anyway and produce storage2/storage structure, unlike rsync -aP storage/ storage2/, which will only sync a content.

Well, then it’s not ok…
Anyway, I started by not omitting the slash. I did exactly what was in the page and got the duplicate entry. Then I tried without the slash and got the same. Finally, I removed the last directory on the path and I got what I expected (with or without the slash).
Rsync has been playing a bit. On a previous move of a small node, the first few passes went without any problem. Suddenly, it would not work without the --delete option.

That works even better than I expected. Since I have 2 nodes on the same IP, I don’t even loose ingress because the other node is getting double ingress.
How do I know that? Because I have a third node on a different IP that always gets double ingress compared to the other 2 nodes. That is also how I know there are no other nodes running on both networks.

http://storjnet.info/neighbors

2 Likes

Nice… it confirms what I thought.

Continuing with my advice for synology storj node users:

I finally managed to migrate my main node (now 5.4TB) out of my raid5 to a single disk (standing in the very same synology NAS). Thanks to the advice of setting an available space smaller than the actual node size (in order to prevent new files) I managed to do the last rsync in about 1 day.
The conclusion is the following: Something is very wrong with at least one of the three following setups for holding a storj node:

i) Using BTRFS
ii) Using raid
iii) Using raid with BTRFS

My success rate is now (single EXT4 disk) more than 99.7%. I was dealing with 80-90% when holding my 2 nodes in the raid5. The disk noise and the LED flashing was all over the place and the NAS was a bit unresponsive with CPU IO wait going up to 60% (Resource Monitor dixit). And I was not using “record file access time” and “file self healing”.
Further, deleting the storj files from the raid5 is a nightmare (whether using fle station or “rm -rf” in ssh). It took me a full day
to erase 1 satellite (one of the small ones!). I guess it will take me the same amount of time it took to do the first rsync to delete the whole node.

1 Like

Congratulations!
I believe all three, BTRFS, RAID and RAID BTRFS.

Seems @BrightSilence either lucky or the SSD cache plays role :wink:

I actually believe all 3 as well. Though I’m using EXT4, not BTRFS, but I’ve heard enough horror stories about BTRFS that I know it’s not a good fit for Storj. As for RAID, it’s effectively just an IOPS amplification system, hitting all disks at once for every operation instead of just a single disk. RAID CAN speed up throughput, but not IOPS. And IOPS is always the bottleneck for running a storage node.

Last month, I had to do without SSD cache for a while after an SSD failure and yeah, I can confirm that IO wait went through the roof and the HDDs became very noisy. This is why most of my nodes now run on external single disks. And I very quickly got a new SSD cache in place as well. (Wanted to upgrade the SSDs in my gaming system to a faster and bigger one anyway. So I’m using those SSDs as cache now… Which… Btw… Is a bad idea. Because they are consumer grade SSDs and not great for endurance in a cache setup. But they are redundantly implemented and I figured I might as well use em up before buying something more fit for purpose.)

So yeah, the only reason I am still running nodes on RAID is to use unused space on a multipurpose array and not let it go to waste. Not because it’s the best setup, but because it’s still one that works well enough and otherwise the space would just go unused. And since I have other reasons to use SSD cache on that array, Stork gets to benefit from that free of charge (not counting the undoubtedly high additional wear)

2 Likes

There is a lot of tips and tricks on this forum, very useful for any sno, and if they were added to the official Docs, the forum whould be much lighter. A Tips and Tricks section right there at the bottom… but as I think this will never happen, my personal node_guide.txt file keeps growing. :smile:
To the subject, I think it’s also best to:

  • stop and rm watchtower for the migration’s duration.
  • stop the FileWalker.
    Il will prevent the update restarts and the very high IOPS from FW.
    The best way, I think, whould be Storj implementing a “migration mode”, that will start by using a parameter like --migrate path.
    Whould make it much easier for SNOs advanced and noobs.
    This migration mode will:
  • tell watchtower to not update this node;
  • tell FW not to start;
  • tell satellites to stop sending ingress;
  • do the necesary rsyncs;
  • stop the node and do the last rsyncs deletes untill all the data is transfered;
    The node stop will prompt the SNO through the Uptime Robot or other monitoring tool that the migration is about to finish and requires his presence.
    But ofcource, this mode will not be implemented by Storj, unless maybe there are a lot of requests for it.
3 Likes

Well, I went the cache route first, though not in the same way as BrightSilence. I put some SSD nvme for the DBs, but for aiding the disks, not for replacing the disks. It’s good to refresh the node webpage fast, not that much better to prevent too much IO.

what is your setup? I remember you showing a pic of a disk box. Does this disk box connect to your synology? Does docker run in synology? The connection is USB or eSATA?

I think this is a bit overwhelming for the majority of users, who will never move nodes. Also, it would hardly be “one solution fits all”.
Maybe it would be better to advise newcomers. If someone says “Hey, I have a bit of space in my synology running BTRFS on a RAID5 and would like to start a node!”, red flags should go off.
Stopping watchtower won’t make much difference, if file walker doesn’t start…

Synology DS3617xs with SHR2 array with mixed disks (kind of a worded case scenario there). Volume is ext4 though and accelerated with 2 NVMe SSDs for redundant R/W cache.
External case is an Icy Box IB-3810u3, connected over USB3. All node DB’s are on the internal SSD accelerated array. Everything runs on docker on the Synology. 10 nodes running on individual disks in that external case + 4 nodes on the internal array + a testnet node.

Oh my… feeling a bit overwhelmed…
I own a DS1621+, the processor might be a “tiny bit” slower than your Xeon… anyway, in what concerns storj, the faster the CPU, the more wait cycles…
My question was born out of having a USB disk exclusive to Download Station (so that I don’t end up with video files all over the place in my array. Once fully downloaded, the files are moved to the array). This disk does make the CPU wait, although I wasn’t sure if that was due to the USB disk itself or to the software Download Station. If your USB Box can hold 10 disks without causing too much CPU IO wait, then I guess the problem is the software (mainly when doing searches).
Why the need for a 10 disk Box when you have 12 bays?
How many total TB on a single IP?

CPU power doesn’t matter that much for Storj as the bottleneck tends to be IO wait… So I just have a faster CPU waiting for data. I have noticed my external USB drives can cause significant IO wait when the file walker is running on all nodes at the same time.

I had a lot of spare HDDs that were rotated out of the array and replaced with larger HDDs. And I wanted to use them. I do have more than one IP, but my first IP holds about 24TB, total is about 44TB for Storj, but the internal array is used for other stuff as well.

And how do you tell your synology and/or docker to use different public IP’s for different nodes?