How best to get started with multiple nodes?

SwisherSweet · June 19, 2023, 7:01pm

Hi there, I’m interested in becoming a new Node operator. I’m learning as much as a can about getting started by reading articles/blogs and watching YouTube videos, but I’m not finding answers to key questions.

I have multiple Synology NAS units and multiple computers and a ton of unused hard drives layout around. I have a strong fiber connection to the internet with multiple static IP addresses.

I want to setup and grow multiple nodes at once. I’ve seen in YT videos that if you are behind a single IP, the satellites will only fill the equivalent of one node across all your nodes, ergo there is no advantage to running multiple nodes. Is this true?

Given that I have multiple machines, lots of drives, lots of bandwidth and multiple static IP address, what would be the best way to setup my nodes to miximize revenue?

Thank you.

Toyoo · June 19, 2023, 7:14pm

You should not be maximizing revenue. You should be maximizing profit. Profit = revenue - costs. Your costs are probably your electricity bills. The more separate NAS units and hard drives you are using, the higher the costs. Hence you should start with a single drive and a single unit, and expand only when your storage to date is full.

You can also get revenue by selling the unused drives and computers.

SwisherSweet · June 19, 2023, 7:19pm

Thanks. Fair point about profit. Since my equipment/drives are already paid for, my only additional cost is my electric, which in my area is pretty inexpensive.

What I’m trying to get an answer to, assuming I have 2 x Synology NAS each with two drives in a RAID for redundancy / increased uptime. If they are each running a single node instance, will they both be filled at the rate of a single node if behind a single IP address?

If so, can the nodes potentially be filled quicker if they have separate IP addresses?

You can also get revenue by selling the unused drives and computers.

Selling unused drives and computers doesn’t generate recurring revenue, but setting them up as StorJ Nodes does.

Toyoo · June 19, 2023, 7:36pm

Selling the unused drives which would not bring profit with Storj anyway, then investing into govt bonds at 10% (at least here ), does. Selling them, and getting larger ones instead to optimize electricity costs is also a good option—for some reason used hard drives hold quite a good value. Alternatively, use them to host Chia plots and slowly replace plots with more profitable Storj data as your nodes grow.

You will see different answers from different people here. I’m in the camp believing that redundancy for a Storj node does not pay off. Parity RAIDs make operations slower, making it more difficult to cope with high traffic and losing money through latency, while mirrored RAIDs waste a lot of disk space. I would probably go with RAID60 at some point (or likely the zfs equivalent), the overhead is small enough then, but I’m not that large yet.

SwisherSweet · June 19, 2023, 7:53pm

Selling the unused drives which would not bring profit with Storj anyway, then investing into govt bonds at 10%

Not a bad idea if gov bonds yielded 10%.

Selling them, and getting larger ones instead to optimize electricity costs is also a good option—for some reason used hard drives hold quite a good value.

Fair point. Thank you.

I’m in the camp believing that redundancy for a Storj node does not pay off

I’ve seen YT videos of folks connecting a large drive to a Raspberry Pi in a shoe closet. What happens when their drive fails? Seems like that would undo all the trust they earned and they would have to start over. Drives always fail, eventually.

Are you suggesting that it is more profitable to let it ride without any redundancy or backups, knowing when the drive fails you lose all the data and have to start over?

Is there anyone actually hosting Nodes at scale and earning any non-trivial profits from this?

Toyoo · June 19, 2023, 8:16pm

Drives indeed fail, but if you look at the usual annualized failure ratio figures, they don’t fail even remotely enough to care (for Storj node purposes), unless they are very old. It’s actually way more common to see failure of other, replaceable components of the storage system. I can live with a yearly risk of 2% of a drive failure. Right now I’m running nodes on five drives, all close to full. One of them fails, I’ve still got four more.

Besides, of these 2%, not even all failure modes actually matter for Storj. Failures where a small part of your drive develop bad blocks is pretty much nothing. One of the drives I run for Storj has badblocks for many years—it still gave revenue multiple times it was worth. Yet regular RAID would immediatelly mark your drive as bad and consider your array degraded.

RAIDs and stuff is great for when you cannot lose even a single byte of your dataset. Losing 2% of Storj dataset is considered acceptable.

ACarneiro · June 19, 2023, 8:38pm

Oh, God!
Not the “To RAID or not to RAID” discussion again!

snorkel · June 19, 2023, 8:42pm

The nodes behind the same /24 subnet, aka 123.123.123.X, get the same ingress as one single node, but for egress there is no restriction. If you have many public IPs, check how many are in the same /24 subnets. Than you can start a node for each /24 subnet. When first drive fills up, you can start other node in the same /24 subnet.
Many recommend using one drive per node, no RAID. Best FS so far seems to be ext4 for Linux machines. Best OS - Linux. Also, from all the problems I saw on this forum with USB connections, I recommend avoiding USB ports for drives. Also, Synology seems to not support tcp_fastopen, which wins you more races, so better use PC/servers with Linux, than Synologys. But is up to you; NAS is made to run 24/7, and has low power draw. There are pros and cons for each machine.

snorkel · June 19, 2023, 8:45pm

You can check my posts about my expirience with Synology NASes, cause I run 8 of them
https://forum.storj.io/t/my-docker-run-commands-for-multinodes-on-synology-nas/22034

SwisherSweet · June 19, 2023, 9:40pm

Thank you everyone for your comments and suggestions.

The nodes behind the same /24 subnet, aka 123.123.123.X, get the same ingress as one single node, but for egress there is no restriction. If you have many public IPs, check how many are in the same /24 subnets. Than you can start a node for each /24 subnet. When first drive fills up, you can start other node in the same /24 subnet.

All my IPs would be on the same subnet. If I started with two Synology NAS, would they just split the ingress and effectively give me some protection from a single node failing?

Many recommend using one drive per node, no RAID. Best FS so far seems to be ext4 for Linux machines. Best OS - Linux.

I’m still struggling to wrap my head around not have redundancy. Is there a good thread on the topic? It would seem that once you get enough data stored and you are cash-flowing, redundancy would be worth it. I get it may not be worth it at the beginning.

Synology seems to not support tcp_fastopen, which wins you more races, so better use PC/servers with Linux, than Synologys.

So nodes that have reduced latency are rewarded with more egress traffic than those that don’t? Therefore, to “win more races” one would need to have a fast internet connecting, low latency network, and fast file access (particularly, ability to locate the files on the disk)? I’m assuming high bandwidth sequential transfers also play a part?

Thank you.

Alexey · June 20, 2023, 4:06am

Yes, but ii is better to start them not in the same time. Each node must be vetted, the unvetted node can receive only 5% of the customers’ uploads until got vetted. To be vetted on one satellite, it should pass 100 audits from it. For the one node in the same /24 subnet of public IPs it should take at least a month (or more).
Since all nodes behind the same /24 subnet of public IPs considered as a one node for uploads, the vetting process could take in the same amount of times longer as a number of such nodes.
So, start the next node when a previous one almost full or at least vetted.

But generally yes - since all nodes behind the same /24 subnet of public IPs gets the same ingress as a one node, it acts like an array, distributing data between nodes, so losing one node would mean losing one piece of the common data, not the whole array as in case of big one node with one RAID.

see

and Topics tagged raid

almost.
When the customer wants to upload a file, their uplink encrypts the file, splits to segments, does erasure coding, forms pieces and request 110 nodes from unique subnets for each segment from the satellite, then starts uploads in parallel. When the first 80 are finished, remained got canceled (because only 80 pieces are need to be stored for each segment).
When the customer wants to download a file, their uplink requests 39 nodes from 80 for each segment from the satellite and starts downloads in parallel. When the first 29 are completed, all remained got canceled (the uplink need only any 29 pieces from 80 to reconstruct a segment).
Thus fastest nodes to the customer’s location gets most of pieces and most of egress.
So any advantage (not using slow disks subsystem and/or filesystem, supporting QUIC and tcp_fastopen, etc.) gives your nodes potentially more wins above others.

snorkel · June 20, 2023, 4:25am

… how nice it sounds , but in reality the flow is not a river, is more like a sink dripping .
You could get 1TB ingress stored in 5 months or in 15 days, it’s very unpredictible.

snorkel · June 20, 2023, 9:45am

With bunch of hardware laying around, I would start 2 machines in the same time, and check the profitability after 6 months. If there are significant differences, I would transfer the data to a new HDD in the most profitable machine, and post here the results for entire community to see. We like to share our experiments.
-machine 1: Synology/ext4 + 1 drive for storj.
-machine 2: PC/Linux/ext4 + 1 drive for storj.
Try using the same HDD model and a smart plug for each, to watch the power draw.

Also, there are SNOs here with big farms, who managed to avoid the /24 limitation by using VPS-es. Check Vadim and Th3Van. There are lots of topics about the subject.
A great member, who also runs Synology, is BrightSilence.

Unique · July 11, 2023, 12:33am

You also have opportunity costs to consider. This is the cost of not doing something else with your equipment. For instance, if you sold it, how much would you get vs the “Profit” you would make over time by retaining it, and how long it would take to break even?

Then investment costs. Say you could have sold your equipment for $500 and it took you 3 years to make $500 profit (break even?). Note quite break even as you could have invested that $500 at say 5% per year giving $75 back over 3 years (lets ignore compound interest). Therefore break even is $575? nearly right.

While your trying to make a profit, inflation is taking away the value of that profit and as Storj is not increasing their payouts in line with inflation… well you get the picture.

Then you have risk. Lets face it, crypto is heavy on risk and I have seen the value of Storj plummet. What I’m trying to say is be realistic and just do it for fun as profit is a lofty goal.