Request for Advice on Optimizing Storj Node Configuration with High-End Hardware and Network Setup

dmarasoiu · December 27, 2024, 10:22am

Question:

Hi Storj community,

I’ve set up a Storj node with a robust hardware and network configuration and would appreciate advice on whether this setup is well-optimized for Storj or if adjustments are needed to meet performance and reputation standards.

Here are the key details of my setup:

• Hardware:

• Host: HP EliteDesk 800 G2 SFF, Quad Core i5-6500, 32GB DDR4 RAM, 256GB + 1000GB SSD

• L2ARC: 2 × Samsung T9 Portable SSD 2TB (2,000MB/s, USB 3.2 Gen 2x2)

• vdev1: 3 × Seagate Expansion Desktop 6TB External Hard Drives (USB 3.0)

• vdev2: 3 × Seagate Expansion Desktop 20TB External Hard Drives (USB 3.0)

• File System: 2-level ZFS architecture (most activity expected to hit SSD cache)

• Network:

• Router: TP-Link AXE5400 Tri-Band Wi-Fi 6E, up to 5400 Mbps, 1.7 GHz Quad-Core CPU

• Connection: High-end Ethernet with realistic measured speeds of 900 Mbps (up/down)

• Node Settings:

• storage2.min-client-upload-speed: 0.5 Mbps → 1 Mbps

• storage2.max-concurrent-requests: 40 → 80

I aim to prioritize speedier users (75th percentile and above), with:

• 90% ingress at 60 Mbps and 10% egress at 150 Mbps.

• Bandwidth allocated as 1 Mbps per slot, with a maximum of 20–80 concurrent requests.

My Questions:

Will Storj nodes appreciate this setup, given my high-end Ethernet connection and hardware configuration?
Is my focus on accommodating higher-speed users (75th percentile) likely to hurt my node’s reputation or success rates due to potential exclusion of lower-speed users?
Are there recommended adjustments to my node settings (e.g., min-client-upload-speed, max-concurrent-requests, etc.) to better balance resource usage and improve compatibility across client types?
Are there potential bottlenecks in my setup, such as ZFS overhead or USB 3.0 external drives, that might affect node performance during peak traffic?

I want to ensure my node operates efficiently and provides a great experience for clients while maintaining a strong reputation in the network. Any insights or recommendations are highly appreciated!

Vadim · December 27, 2024, 10:30am

Hello.
welcome to forum.
To be honest it is absolutely overkill. Storj dont need redundancy.
I think you will not get back even Electricity cost, unless there will be a miracle, Storj get client with big load and big bandwidth needed, then you setup will be OK and benefit.
Maybe next year ingress will be bigger, then in a year or 2 it start to be profitable.
Just dont get unrealistic expectations.

dmarasoiu · December 27, 2024, 10:33am

Thank you, actually i am starting with just the first vdev (3 * 6TB with RAID5 / Z1), why i was thinking about this is to be HA in terms of loosing any one drive, and thus keep data and be helpful and thus hopefully keep my reputation / standing and gain traffic from it.

Vadim · December 27, 2024, 10:36am

You can start with 1 or 2 drives no redundancy, and then add as you fill nodes. I have seen than just controller problems in raid kill nodes, but if you dont put all to 1 basket, and make several nodes in case of 1 falure you still have other nodes. Also in redundency you lure 1 hdd for it, just make additional node there, it will be you redundancy, if you accidently lose 1 node.

EasyRhino · December 27, 2024, 5:29pm

I mean, it’s fine. overkill actually.
what operating system are you using?
you are brand new to storj hosting right? you have no accumulated data so far?

Advice for the drives.

storj data fills extremely slow. years. I have like 8 nodes and the most data on anything is 4TB. I had closer to 7TB in the summer when there was a surge of test data.
don’t use any redundancy (RAID). You’re not paid for it.
set up a new node, one for each disk. This actually give you more performance (due to the independent random I/O on the nodes). Also if one disk fails, you just destroy the node, the others still operate file.
I haven’t noticed any need to mess around with concurrent requests or minimum upload speed. the defaults are fine and changing doesn’t really affect anything.
due to slow fill rates, if your electricity is cheap, I would take the 3x 6TB drives and set up 3 nodes. use the 20TB for other things. If electricity is expensive, I wouldn’t use the 6TB drives at all, but use the 20’s, but only part of the space, and use the rest of the space for other things. the max theoretical size for a single node is 24TB, but practically I don’t think people have gotten over 15TB or so.
under current (low load) scenarios, you honestly don’t even really need a l2arc at all. Under the flood of test data from the summer, a L2ARC (or another SSD cache scheme) became necessary. you’ll want to set it for metadata only though to conserve space usage. trying to cache the data itself is pointless because it’s so random. But the metadata cache is very helpful for the “housekeeping” chores that a storj node runs, including used space filewalkers and garbage collection.
You can l2arc with a single SSD. For instance i have a used SAS SSD with a peak speed of like 800MBS but it can service 8 nodes’ metadata without breaking a sweat.
You will probably want to host each node’s database on a separate SSD rather than on the storage drive.
Uou can enable the experimental badger cache which will help out performance on used space filewalkers that run on startup.
the biggest thing I can’t comment on from experience… all your drives are external. that could cause some reliability problems (getting unplugged). It could also potentially cause a bottleneck if all the drives are going through one USB controller. maybe. The raw throughput from the drives is fairly small, but there may be some more subtle voodoo with USB.

Roxor · December 27, 2024, 6:49pm

Repeating the opinions of others:

Storj doesn’t need mirroring or parity. But you can use it if your config is unreliable (like an array of external USBs ).
At current rates: you probably wouldn’t fill even one 6TB in the next year. So no need to go overboard. Unless you have extra different-/24 IP addresses, more nodes won’t help you fill faster. Start with one HDD.
Storj workloads look like random IO, so L2ARC doesn’t help much. But a special metadata device on SSD helps a lot (as many housekeeping tasks read thousands/millions of filenames)
Having the node databases on a SSD also helps
You don’t have to fiddle with upload/request tunables: as Storj rarely uses significant traffic (it has happened once, for 2 months worth of test data, and perhaps never again). It’s like buying race tires for the Civic you commute in: if it makes you feel better go for it - but it won’t make you faster.

Basically you don’t need anything fancy: all you can really do is keep it online, and wait…

ACarneiro · December 27, 2024, 7:44pm

Agree with all above opinions.
For context, I am running a node currently hosting 10TB (of an available 18) on a USB enclosure plugged into a Raspberry Pi5.
UniFi router and a 500/100 Mbit FTTP connection.
This setup is not even breaking a sweat. It coped pretty well even during The Great Stress Test of 2024™

snorkel · December 27, 2024, 8:49pm

The reputation affects only the node in question, not others you may have/will have.
Each node is rated independetly. No need for redundancy, as others said. Start only one node/drive/ 24 subnet. Anything above this is just uneeded electricity cost.