1 PB node(s) as fixed money income

samlabs-ca · July 9, 2024, 8:57pm

ow, this is new to me.

even with a lot of processing and memory?

I was planning on using a xeon e5-2650L processor ( 14c/28t) and 128 GB per node ( 4x 90tb raid-z1 ).

Ruskiem · July 9, 2024, 9:02pm

if it would cost You totally ~12k, then i would say You might go for it. Its just an hobby anyway, nothing guaranteed. A lot of tweaks and fixing might occur, as we SNO can be subject to changes in customers main use cases, You have to be ready to look after the nodes daily, its not completely plug and play, its for ppls who realy feels they want to do it. If Your good in progrmaming maybe You want spend Your time there rather. There are no big money in being SNO, a 600-700$ per month is very real tho, with 1024TB of dedicated space.

Roxor · July 9, 2024, 9:08pm

Each TB is going to contain 3-4 million files… and once a node starts getting larger than around 20TB the background utilities a node runs can take forever. CPU doesn’t help as it’s all disk IO. Memory does help as a cache, and clever use of SSDs can speed things up… but with HDDs only doing 100-150 iops at some size the math just isn’t in your favor.

A better use of those CPU cores and RAM is to skip RAIDZ1, use no parity, and just run one node per HDD. Like if you have 24 20TB HDDs… run 24 nodes (in docker, to make maintenance easy).

Knowledge · July 9, 2024, 10:39pm

Best practice is if you have a single IP, you add capacity as the existing capacity is filled. Many node operators use VPN/VPS to get around the subnet rule and run more nodes that way. (Not recommended) Even this can be a struggle to fill drives due to a lot of ongoing garbage cleanup.

Future growth is customer dependent and as SNO’s will tell you, it is very random and not easily predicted.

That all being said, some SNO’s with patience have grown their node farms and generate decent returns. It just takes a lot of time to get there.

samlabs-ca · July 10, 2024, 12:44am

I am using TrueNAS and can put some SSD as L2arc cache… Maybe this can speed up things?

I was expecting to break the nodes in individual disks as you said, but the problem is having 24 nodes under the same IP or load balancing them between 2 or 3 IPs.

samlabs-ca · July 10, 2024, 12:50am

yes, this is my point in creating 90 tb nodes… I can add more internet links (or VPNs), and more IPs as well, but at some point, this will be unworkable as the internet cost will be more than the revenue.

snorkel · July 10, 2024, 4:22am

I think you all forgot the Bloom Filter problem.
For the new-to-become SNO, the Bloom Filter, as I understand it, is a file generated from a backup copy of databases maintained by a satellite, on Storj servers, that containes the list of pieces a node should keep.
The Storagenode that receives it, when processing it, it will have a 10% false positives, meaning it will keep (retain) the pieces that must be kept plus 10% more from the pieces that should be deleted (aka sent to trash).
Until a few months ago (a year maybe ?), the Bloom Filter was to small for bigger nodes, over 10TB of data, and the nodes didn’t cleaned up all the trash, ending in filling those nodes with should-have-been-deleted pieces, unpaid pieces. Because the satellite pays you accordingly with what it knows your node should have, not acoordingly with the node actualy stores. If it’s filled with undeleted trash, that space is unpaid and blocks new paid pieces to be stored.
After some months of impruving the BF, they increased it’s size and can properly clean big nodes. I don’t know the new limit, but I believe they maked it safe for a 1-drive nodes, which until next year at least, are under 50TB (because the biggest CMR drive comercialy available has 24TB, and 30TB in next months).
So, if you plan to go RAID way, you basicaly begging Storj team to generate Bloom Filters dedicated to your huge nodes, and expect them to dedicate resources just for the-one-that-didn’t-listen-to-recommendations.
Just stick with 1 drive per node, no RAID, ext4, noatime, and some SSD or NVMe drives for databases, logs, orders, matadata cache. Add more drives as the first ones fills up. Keep in mind that all nodes behind the same /24 subnet are getting data as a single node.
You can go with 2 nodes per subnet, just to share the load. You will get some edge over the others that have only one.

snorkel · July 10, 2024, 4:30am

Question for storjlings:
-Does huge nodes create unnecessary stress on Storj servers for Bloom Filter generation, and the max node size should be limited?

pangolin · July 10, 2024, 12:48pm

I don’t think so. BFs are still not big enough for largest available HDDs.

Alexey · July 11, 2024, 6:58am

no problem at all. You would have to configure 24 unique external ports and 24 unique ports for the dashboards (contact.external-address and console.address), if you would use a docker setup. For Windows setup (or with --network host in the docker setup) you would need to change roughly in four times more unique ports (for server.address, server.private-address, contact.external-address and console.address).