What should I do to manage 1000 nodes when BIG customers comes?

Mitsos · June 7, 2024, 1:36pm

Per @agente’s request.

Mitsos · June 7, 2024, 1:39pm

Let’s do some math. 24x20x$1.5=$720 just in storage.

Electricity cost: deleted, see below

Does the math work out or not? We also need cooling don’t we?

ACarneiro · June 7, 2024, 1:51pm

That’s 200W at 40c/kWh (expensive!)
So that’s 4.8kWh per day or 144 kWh per month. That works at $57.6 per month, on my maths. Where did you get 80?

Roxor · June 7, 2024, 1:55pm

My rule-of-thumb is 10w/HDD to cover it’s use, any controllers, fans/cooling etc. So a 4u 24-bay SAS JBOD would work out to a conservative 240w.

An older 8c/16t Intel on a consumer motherboard with 32GB RAM and logs/db-on-SSD can comfortably run 48 nodes today (during the performance tests). Unknown if there’s enough CPU left to do 96 nodes if RAM was bumped to 64GB: probably? Say 150w for this controller system.

So 4 of the storage JBODs… plus one controller… would be around 1250w continuous for around 100 nodes and almost 2PB (20TB x 24 x 4)?

It’s interesting to think about. The hardware may be simple: but getting /24 IPs for those 100 nodes with no peers may be difficult.

Thanks for making the thread so we can play what-if!

Morcin42 · June 7, 2024, 1:59pm

Does anyone already know how the /24 rule applies to the “new normal”? How will this affect the nodes? If the traffic is enough to fill multiple 20tb disks on a single IP, it might be more manageable?

Mitsos · June 7, 2024, 2:00pm

a brain fart moment on my end

Mitsos · June 7, 2024, 2:01pm

Yea but the discussion started on if we should add another JBOD or another host (motherboard+cpu+ram). That’s the only difference I’m measuring. If the drives would be in a JBOD, they would be powered anyway, so no point in counting that.

Roxor · June 7, 2024, 2:05pm

I think most SNOs will do like (I think) you’re suggesting: as nodes fill park them beside others behind the same single IP (just different port). And if they did still have a separate /24 they’d assign it to a new incubated node to start growing.

So although Storj may like to see more unique /24’s come online as more data flows… I think they’ll see about the same number of unique /24’s and more-and-more full nodes pile up behind the same IP.

Roxor · June 7, 2024, 2:08pm

I may have missed the nuance in initial questions: but to me I’d only daisy-chain new JBODs… unless the system running the nodes was pinned on RAM or CPU. Especially if we’re talking about high-ingress leading to many full nodes. Those full nodes will be idling when full and only accepting new data long enough to refill trash.

Mitsos · June 7, 2024, 2:10pm

You are forgetting the “new normal” ie bursts of TTL data. That way the drives would never stay full for a significant amount of time. What if GCs and FWs get triggered simultaneously as well?

Roxor · June 7, 2024, 2:20pm

Do you think background gc/fw will be a big deal? They aren’t now. At a steady state… drives will be deleting the same amount of data per day as is coming in… to remain full - so that’s perhaps dozens of GB/day optimistically… not TB/day.

I was thinking of this thread more as ideas on how to economically run 100+ nodes if you needed the raw space. If you stick with one-node-per-HDD I don’t think you run into IO issues. My guess is beefy consumer desktops are probably good to 75+'ish nodes… and that more server-type setups can probably do 150+ without breaking a sweat.

agente · June 7, 2024, 3:21pm

Question is “How many nodes with this piece of HW?”. If what @Roxor said is true we don’t have problems. 8c single cpu with 32gb ram comfortably run 48 nodes… well. Scale up to the sky without problem.
My experience (hypothetically… TOS is TOS) is that with no caching setup, basic ext4 system and quite full nodes (14/18tb) you cannot run easy 48 nodes on 32gb of ram.
Moving all dbs on SSD will fix everything? One SSD for 48 nodes? 2? 3? Raid 0? Good to know. It is useless to think of a more complex system with caching then.
I actually think it’s not enough but I could be wrong. You need to work for a full cache of metadata for easy manage big number of nodes and avoid big iowaits.

The question about expenses and how best to scale your hardware needs this clarification first.

Mitsos · June 7, 2024, 3:27pm

My approach to this is to scale up the rigs (hosts) instead of figuring out ways to run more on less hardware. In the grand scheme of things, scaling up to more hosts is actually beneficial because I don’t need to manage any arrays.

Roxor · June 7, 2024, 3:43pm

There are definately many ways that work. I like just adding power to an enclosure, one SAS cable to the HBA (or other JBOD)… then start shoving disks in and format/mounting them. No array management: the OS sees each as a separate HDD.

So say $350’ish for the enclosure+cable… plus drives. No new OS install (or entire computer) - nothing new to update or patch… and your power goes up by 200-250w. But you do need to already have a system beefy enough to run 24 more nodes - and while CPU may be almost free you will need more RAM.

That doesn’t scale forever though: like if Th3Van has 150’ish HDD slots connected to one multi-HBA server maybe that’s the most we could consider reasonable for a homelab? Or 100 nodes? Then you need to add another rig…

Mitsos · June 7, 2024, 4:20pm

My plans are to start adding zen 5 rigs with the next hardware upgrade cycle (does not mean HDDs, those are always added as filled). Nothing too crazy, just something with 12 cores/24 threads and max out the ram.

Hopefully they come down from their initial “OMG so shiny!” price