Node migration plan [advice needed]

Fill · October 16, 2023, 8:40pm

My node is not configured by the recommendations. At this time it made sense to me and it work and its only until recently have those problems caught up to me due to my node getting more traffic. I do have to be thankful in me miss-configuration the way I did as I have learned a lot about file systems and network shares. Now onto how my current node is setup and yes I know it bad:

I run 2 server that manage the OS part and the storage part. I am using Truenas with 3 8tb disks in, basically, a raid 5. These disks are then shared to proxmox via NFS and then fed in the storage node as virtual disks. This way the node has no clue its NFS.

This arrangement works on a lot of other servers I run but as the storj doc’s say, everything about this is a big fat no, so here is the plan I have made up to fix this and bring it in line with what is recommended.

Change the available storj space to a lower amount. This will stop file growth while not offline-ing my node.
Attached another drive (calling it arc) to the node. This would be a normal hard drive connected via sata to usb connection.
Do a cp -ravu from the OG location to arc.
Then a few rounds of rsync -avhP from OG location to arc.
Shutdown node and run a final rsync -avhP. I don’t think I need -delete as I will be formatting the raid 5 array.
Remove the 3 8tb disks from truenas and plug them directly into my proxmox host.
Attach each 8tb disk by itself formatted in ext4 to the node, no raid.
cp -ravu the node data from arc to 1 of the drives that is now directly attached to the node.
Do a rsync -avhP from arc to the new destination, just as a final check.
Fix any configs the nodes needs for the new file paths and restart node.
Spin a new node for each of the remaining 2 disks.
Make sure storj node has the extra resources it needs, 4 cpu cores, 3-6 GB of ram.

Am I missing anything or need to change my migration plan? I want to make sure I am not missing anyting.

daki82 · October 16, 2023, 9:27pm

make sure its properly formated (ext4 i guess), and has the right clustersize(if ntfs, dont know ext4 good enough to know if there are clusters.) not exfat or unreliable big clusters.

can @Alexey or @Toyoo review this?

don’t forget the identity! orders and dbs.

is it 3 drives? or do you istall arc too? i think its only 3 nodes then. so 3 cpu cores.
some recommend 1GB RAM in linux for 1TB storagespace filled

make sure they are cmr not smr.

just my 2 cents

Toyoo · October 16, 2023, 9:28pm

-u is not necessary. Not sure about the implementation of cp from coreutils, but a naïve implementation would make -u unnecessarily slower.

--delete is not about deleting source, it’s about deleting from destination. And while it will not hurt skipping deletion of blob files, it will definitely hurt if you forget to remove database WAL files.

You can take advantage of some of the ext4 formatting tips floating around the forum, they reduce memory use and make nodes slightly faster for some operations.

Other than that I don’t see any holes in your logic.

There might be a faster alternative to copying tens of millions of files though. It’s been a while since I used proxmox, but there might be an option to do a live storage move, which copies the whole partitions, as opposed to single files. Even if you’d only apply it to the NAS→USB part, it will still save you a lot of time. Are you using qcow2 on top of your NFS?

And, frankly speaking, even without proxmox, I’d probably try a partition-level copy at least for the NAS→USB part even at the cost of downtime.

Edit: one more thing: consider running nodes inside containers, not VMs.

s-t-o-r-j-user · October 16, 2023, 11:06pm

In case of more traditional approach to copy / sync you may try:

1. rsync with gnu parallel

# for local use:
cd src-dir
find . -type d -o -type f |
  parallel -j10 -X rsync -zR -Ha --relative ./{} ~/path/to/dest-dir/

# for ssh use:
cd src-dir
find . -type d -o -type f |
  parallel -j10 -X rsync -zR -Ha --relative ./{} /{} fooserver:/dest-dir/

Those flags [-zR -Ha] to adjust, it is just a very preliminary draft I used for something else.

2. parsyncfp or parsyncfp2 [I have never tried it but looks promising]

http://moo.nac.uci.edu/~hjm/parsync/

3. Oracle File Storage Parallel Tools [fedora based distro is a prerequisite]

Fill · October 16, 2023, 11:14pm

No I am using raw formate as its faster (from my looking but I can be wrong). This limits me a bit and considering I might be changing filesystems, I feel more confortable just doing a straght copy of the data.

@s-t-o-r-j-user thanks for the links, was trying to figure that out but was having a problem with it

Toyoo · October 16, 2023, 11:19pm

I see. So still running sth like e2image instead of a file copy would make the process much faster. But there’s nothing wrong with just copying files, it’s just going to take time.

s-t-o-r-j-user · October 16, 2023, 11:27pm

You are welcome @fill. Pls use with caution - this GNU Parallel thing is a bit wired but should work. Oracle File Storage Parallel Tools are great. I would be interested in any review of parsyncfp as I indicated I have never used it before. There is also zettar in case you are very serious but AFAIK its not open source [DOE Technical Report: When to Use rsync?].

Alexey · October 17, 2023, 1:42am

Hello @Fill,
Welcome to the forum!

You must not clone the identity of migrated node to them, you must generate with identity create each one and sign them with own new authorization token.
See also How to add an additional drive? - Storj Docs

arrogantrabbit · October 17, 2023, 1:55am

This will be much slower than just rsync in one thread. You want less latency, not more.

Also, parallel is an extremely annoying utility (“will cite, I swear”). xargs does the same but better.

s-t-o-r-j-user · October 17, 2023, 2:05am

As I understand it, the find part is using 1 thread, the rsync part is using 10 in the provided example but I will not argue also re annoyance which I called a bit wired thing (just a different word). I have never used xargs but AFAIK the syntax is even … more wired? Any experience with parsyncfp or Oracle File Storage Parallel Tools?

s-t-o-r-j-user · October 17, 2023, 2:08am

Could you provide some numbers?

arrogantrabbit · October 17, 2023, 2:09am

The performance is limited by seek latency, not throughput. Adding more threads adds more seek latency.

s-t-o-r-j-user · October 17, 2023, 2:19am

I am not saying you are not right, I am just asking about the numbers, if you take a look at the report I linked above as far as I understand it, it is not that obvious, disclaimer, I am not claiming that this report is perfect.

s-t-o-r-j-user · October 17, 2023, 2:23am

And abstracting from the report, the same would have to be true in case of Oracle File Storage Parallel Tools, which I think is not.

s-t-o-r-j-user · October 17, 2023, 2:23am

I do not argue. :- )

Edit:

You just made me think, and I have never made any measurements, also just wondering, assuming you are right in case of one drive, would it also be true in case of more then one drive being accessed at the same time by separate threads, to be honest, I do not know.

Of course, would be happy to learn more.

arrogantrabbit · October 17, 2023, 2:41am

I don’t have numbers, but ballpark reading from a single drive in 10 threads can be about 1000 times (three orders of magnitude) slower than in a single thread.

Of course, with higher abstractions, the further you get from physical disks – the more you are actually dealing with tiered caches and soon indeed become CPU limited. Then multithreading might actually help: properly designed storage should result in no IO hitting disks when reading metadata and doing most of the things rsync does.

And yet, Rsync is the slowest tool possible for the job, regardless of setup: not only it has overhead in resumable mode (because who takes time to turn it off) it also validates results written. This is great for critical data, but storage node does not have anything critical. It’s weird to focus so much on the performance of the slowest and most reliable tool on the planet to copy very unimportant data in bulk.

Moving multiple small files shall be done by sending entire filesystems. This is the only way. On one hand, if you use ext4 you can’t do that (other than imaging entire partition, which comes with downtime). On the other – nobody said ext4 is appropriate for running the node. It’s an ancient filesystem, and even one of its developers recommended using btrfs for all new deployments in one of the interviews ( I don’t have a link, but will try to find it)

So the point is kind of moot: answer to the question – what I the fastest way to use the slowest tool to copy data from a filesystem that is not designed for bulk transfers… the answer is “it does not matter, because unless you send the whole filesystem random IO will be hopelessly slow”. Hence, the recommendation to use rsync of moving the storj node, because it’s the single tool that can both copy and sync and thus simplify documentation. The process will take days anyway. Optimizing it makes no sense. Taking days without node downtime wins over few hours with downtime.

Lesson here – in modern environment use filesystem that supports modern features. Ext4 is usable for boot drive for IoT. That’s it. (I’m exaggerating here, but only a tiny bit)

s-t-o-r-j-user · October 17, 2023, 2:50am

I made a disclaimer by writing "In case of more traditional approach to copy / sync you may try" and I was not discussing explicitly parallel nor distributed file systems nor I had Ext4 on my mind when writing this short post. I admitted that I have never made any measurement related to the rsync + gnu parallel approach presented above nor I have been using parsyncfp and parsyncfp2, however, I have been using Oracle File Storage Parallel Tools quite extensively and my feeling was that it was clearly faster in most of the cases. :- )

arrogantrabbit · October 17, 2023, 2:53am

This makes sense, and I too would fully expect the purposely written tool for copying data on the hyperscaler to be actually faster: there is tons of caching involved and the filesystem performance is far removed from the single disk latency limits.

In vast majority of posts here, however, the data resides on a handful of HDDs. This includes OP, there is no indication of a special device or other caching being involved to make thing CPU limited: therefore the goal is reduce IO, and therefore minimize number of threads.

BTW, I’m not arguing either, I’m just perhaps sounding more grumpy than usual, sorry about that

s-t-o-r-j-user · October 17, 2023, 2:57am

No need to feel sorry. Just wanted to add that my post was written as praeterea to @Toyoo’s recommendation of using the whole partitions and / or images. And I would like to reiterate that I made a disclaimer about more traditional approach to copy and sync explicitly. :- )

arrogantrabbit · October 17, 2023, 3:12am

Very nice article. This is all that we need to know:

… many important research datasets are dominated by small files. For example, for deep learning, LOSF reads and small random IOPS for large files are prevalent24. But note also that some LCLS experiments have output files in the multiple TB range. Thus, using rsync as a data mover in many cases today would be highly suboptimal.

Most of storagnode files are under 16k… And the fellas are testing highly optimized storage systems. Not proverbial synology nases in the closet