Performance: More Than 1 Node On The Same Array Patitioning?

svet0slav · January 1, 2022, 12:39pm

So my idea is to set up 2 nodes on the same array, but on separate partitions. Example is 1 x RAID 6 array with 12 disks of 4 TB each. That would account for 40 TB of storage. It is an Ubuntu 20.04 and here is my partitioning plan:
/ - 13 GB (5 GB should be enough because I usually mount /tmp, /var/tmp, /home/$user/.cache and /var/cache as tmpfs, but just in case)
/boot - 1 GB
/var/log - 2 GB (optional)
SWAP - 8 GB (because RAM on the machine is 32 GB and I do not care about hibernation) - please note SWAP is not in the middle of the logical volume of the RAID array - is not the same as placing it this way on a single disk - for as fast access as possible because of less needle movement.
/node0 - 1/2 of what is left
/node1 - 1/2 of what is left

How would you approach this, Linux guys? Is there any better way according to your knowledge and experience? Would you advise using noatime,nodiratime options in fstab, no matter these are HDDs?

Next step: attach modular storage arrays (RAID6) for extra nodes on the same server…
Partitioning plan:
/node3 - 1/2 of the MSA volume
/node4 - 1/2 of the MSA volume

More arrays…
/node5
/node6
…

Note: these are SAS-2 drives - 6Gbps

Bivvo · January 1, 2022, 1:00pm

Sounds like a huge investment (12x 100$ 4TB + hardware on top). Does that repay?

svet0slav · January 1, 2022, 1:09pm

Well, first I believe in the project. Second, I get them for about $35. Third - your answer is not performance oriented. Happy New Year!

svet0slav · January 1, 2022, 1:14pm

I am still wondering, if this is a good idea. It may reduce overall disk and hence RAID array performance because data from each node would be accessed regularly and in parallel on both partitions. This would force the disks’ read/write head to move back and forth on the disks to access data on each partition.

So maybe actually 1 partition for both nodes, but in separate folders is a better idea?

twl · January 1, 2022, 3:19pm

Why do you want to run 2 nodes on the same machine?

SGC · January 1, 2022, 3:36pm

i run containers because then they can share storage.
i got quite a few nodes running on the same pool, i suppose vm’s might not be very different performance wise, but i like that its directly on the ZFS pool / raid.

the iops is always the limitation i run into, the filewalker when booting the nodes is particularly rough, takes like 6 hours…
i like the efficiency of the shared storage, because disk’s are empty for less time and then i can throw in new disk’s as needed.

i would caution again running to many nodes, i’ve seen some people try to do hundreds and lets just say it doesn’t turn out well, but a few especially on a good storage setup shouldn’t be a problem.

new nodes seem to get a lot of small files, while on older nodes the files to TB stored ratio goes down a lot, which i think is by design from storjlabs side… making it difficult to just spam create tons of nodes, because these new nodes will get very high io compared to the TB stored.

how many can you run on your setup, well i cannot say with any clarity…
i’ve been adding mine in segments to ensure i don’t overload my setup and evolving my setup meanwhile.

and lets just say there was been some interesting times… one time i replaced the docker storage driver, which made my nodes crash randomly… took a month to figure out what was going on and for that period my online score was dropping due to node randomly crashing due to high io loads.

did lead to a lot of hardware optimizations, so it’s not all bad, these days it runs very smooth… so i’m a bit reluctant to tinker with it lol

sas drives are so nice due to the redundant connections on the sas interface, it really makes sas so much more stable, compared to sata… sad to say i mainly run sata…
but its difficult to switch when one has a lot of sata drives already and then there is ofc the price point… even tho used sas drives can be at bargain prices at times.

i would say do your setup however you find the most practical for your use case… there are node operators running all kinds of different setups and most of them run just fine, it generally just boils down to ease of use and what other projects that you have which might share the storage, while the storagenodes grow into the capacity supplied.

svet0slav · January 1, 2022, 5:01pm

Because I want to. I have multiple subnets, I have the hardware for a project, which is not fully ready, yet, and I can test and use for STORJ on it. If STORJ is doing well, I can buy separate hardware for that project.

svet0slav · January 1, 2022, 5:05pm

Having in mind the IO on separate partitions simultaneously, maybe it is a better idea to have both nodes on the same partition. EXT4 is great and I think should do better when both nodes are on the same partition. Wanted to see what others think, really. In theory, it should be better this way. Looks like not many share their setup details, except you.

SGC · January 1, 2022, 5:52pm

give it time, i’m sure lots of opinions and suggestions will show up.
EXT4 is among the fastest for sure, yeah using multiple physical partitions… is most likely a bad idea, due to the segmentation
these days when i think about partitioning stuff i think in the ZFS datasets
and just feed the entire disk to ZFS and let that figure it out.
partitioning can be useful, but most of the time i just feel like it ends up getting in my way.

and doing multiple physical partitions would certain create more io
i was just thinking you might have been referring to vm partitions / vm disks existing on the same pool / array, since the other way has more disadvantages than advantages…

one rarely wants existing partitions to be smaller, more often one just runs low on capacity and have to bother with stitching stuff back together again, migrating data around…
no thanks to that… my only partitioned disk at the moment is a 1.6TB PCIE SSD which has multiple uses in my ZFS pool and so i had to create partitions to make that work.

ZFS has this nice thing which is sort of like ZFS partitions called datasets, which is just a lovely feature, but they exist on a pool / array.

if you do very large storage setups you might want to take inodes into consideration, as some filesystems call run out of inodes and thus loose the ability to add more files… but i’m sure that’s not very relevant for EXT4… more a NTFS and maybe EXT2 or 3 thing and maybe some others.
basically a filesystem or most filesystems has a maximum number of files they can manage.

so with some of the older ones it can be a pretty real limitation when working with many millions of files, but again with a couple of nodes i doubt its a real issue for EXT4.

remember to keep a neat folder structure, and as a personal preference i try to keep the first folder names short, but thats just because i’m a decades long windows user.
still does imo make life a little easier and easier to read / move between paths when names are reasonable.

windows use to have a limit on folder path depth… or it still does i believe it just much longer than it use to be… but i still run my head into that wall on occasion lol

svet0slav · January 1, 2022, 6:45pm

Sure. It can boost performance, but it can also degrade it. In this case, I think it will degrade the performance like I already said.

Toyoo · January 2, 2022, 11:37pm

BrightSilence · January 4, 2022, 9:27pm

Let me just throw something in here before I get into it further. If you’re building this system for Storj alone, RAID is not a great setup to go with. Storj does many small reads and writes, all of which will hit all disks in your array. IO is going to be your biggest bottleneck, especially if you are going to use multiple subnets. Instead I highly recommend running 1 node per HDD instead. You may not have 12 subnets, but if you have less, having multiple nodes within a subnet spread out over multiple disks works MUCH better as a load balancer than using RAID. As every small read/write will only hit one disk and if you have 2 disks on 1 subnet, they will simply get half the load from the network. Of course this means you don’t have redundancy. I don’t think that will be much of an issue, most HDDs last a long time these days and you’re spreading the risk over many disks. So one failure will not have a big impact on income. You’ll also have 2 more disks to fill up and maximize performance on. Especially if you have plenty of subnets, you will be able to fill all of them up. Though perhaps you could use 2 HDD’s for OS and other stuff in RAID1, which will be plenty fast to host all that and a node on the remaining space if you want your setup itself to be redundant.

I personally run my nodes on devices that were already online 24/7 for other purposes so my own setup is likely not as relevant for you. I do use a RAID6 (kind of) array, but only because I’m using spare space on a multi purpose array. But despite this array being SSD accelerated the IO load on the drives is still quite significant. If I didn’t run this machine for other purposes, I would have one node per HDD. As I expand, I’m planning to move older drives out of this array and run separate nodes on them.

Yes, absolutely. The file walker(at each startup) and garbage collection processes go through all those tiny files frequently and these settings make a massive difference on runtime for those processes.

This is determined entirely by the segment sizes that customers use. Storj doesn’t influence that outside of their own test data. So not by design. Everyone would be better off if customers used larger segment sizes for most of their data. Customers would have better performance, Storj Labs would have less overhead on the satellite side and SNOs would have less IO load. The only reason older data was predominantly larger segments is that the old data was mostly test data that used the max size most of the time, while current data is mostly customer data of all sizes.

svet0slav · January 5, 2022, 4:31am

Yes. At least for now. Would need the hardware later on for something else and why have it laying around doing nothing?

I would not do this for sure, because of:

Same here, but have some hardware laying around and I need for some testing purposes.
My actual goal is to use the hardware for something while testing some things on it - STORJ seemed a good idea while the project the hardware is actually needed for is fully ready, especially having in mind I could simply get more hardware for that project separately, if STORJ turns out to be working well and is worth the separate hardware investment.

And what about different sub-domains, but same IP? At the same time having that many nodes would mean endless waiting to get paid for each and so many fees…

BrightSilence · January 5, 2022, 8:35am

That won’t help you at all. Storj will resolve the IP and look at the /24 subnet either way. If you have a single IP, you’re getting traffic for a single subnet.

That’s not true. Payout thresholds are per address, so you don’t get paid any slower. But please make sure you have one vetted node per subnet before starting the next one on the same subnet. Otherwise vetting might take longer.

If you want to use the hardware for other stuff, then yeah, you probably have different redundancy requirements for that. But don’t get too caught up on redundancy for Storj. HDD failure rates are pretty low these days and losing a single node when you have 12 is not that big a deal. I’ve been running 17 HDD’s since I started Storj in feb 2019 and haven’t had a single failure. And this is with HDD’s of different age ranges (and even includes ancient 250GB sata2 HDDs).

svet0slav · January 5, 2022, 11:11am

Nope. Per node. So I would stick to RAID 6 with redundancy for sure. 2 large nodes instead of 12 small I think is better. STORJ suggest to use RAID in the documentation - node setup. I would not use my backup servers to back up each node. I would rather count on the redundancy.

So… Back to topic… 1 partition for both nodes better, I guess.

svet0slav · January 5, 2022, 11:28am

The machine has 4 IPs on 4 different subnets, so it is getting 4 nodes. 2 on itself and 2 on a MSA. Not that it really matters because the question is a bit different - whether to have different disk partition for each node or not. Performance-wise, now I think it is better to use the same partition of 2 nodes on the server disk array, not separate partition for each node. Same on the MSA - 1 logical volume and just different folders.

BrightSilence · January 5, 2022, 11:35am

You are absolutely wrong.

Please point to this, because so far the advise from Storj has always been 1 node per HDD.

Edit: Let me help you out a little. The only place RAID is mentioned in the knowledge base is here: https://support.storj.io/hc/en-us/articles/360032150431-How-can-I-set-up-to-run-multiple-storage-nodes-

If you want to run multiple nodes in the same location, you have a few options:

use the OS functions to make a RAID array from your disks. Please note that we do not recommend this option, especially the use of RAID without parity. Without parity, a single disk failure would result in the loss of the entire node with all its raided disks. On the other hand, using RAID with parity, you will waste part of the array for the parity configuration.

The only place in the node docs is here: How to add an additional drive? - Storj Docs
Which literally starts with:

We recommend to run a new node, if you want to add an additional hard drive.

Those recommendations have also been frequently repeated on the forums. So… you’re gonna have to come up with a pretty tall mountain of evidence that that recommendation has changed.

svet0slav · January 5, 2022, 11:58am

Yep. Looks like that was updated 9 months ago.

svet0slav · January 5, 2022, 11:59am

I will have to find it because this is how I remember it, but I am definitely not running so many subnets on the same machine. I need larger nodes not more, but small ones.

svet0slav · January 5, 2022, 12:00pm

In your face! Hardware Requirements - STORJ SNO Book

So yea - pointed. Now what? CAN and COULD we get back to the topic?