Multiple 2TB nodes or a Single 10TB node?

cpare · May 20, 2022, 9:57pm

I have seen a few posts advocating that 2TB is a “Sweet Spot” for a Node but I haven’t seen any clear explanation on why this is the case. Can someone explain this for me?

I am currently running a single 8TB docker (2.5GB used space), if I can fill the space faster by running more containers on the same hardware I am willing to give it a shot but can’t see the value in it knowing the same CPU/Memory/Network is in play.

Craig · May 20, 2022, 11:52pm

You won’t be able to fill space faster if all the nodes would be within the same /24 subnet. Storj will treat them as one node. Traffic will be balanced across them in my experience, but it won’t get you any “more”.

One benefit of splitting this to multiple nodes as individual drives with one node per drive is as the drives fill up you can migrate to larger drives and already have the established reputation vetting of all the setup except the new drive. I started with on 2TB drive, then added a 2TB and 1TB drive, and when they recently all filled up I upgraded the 1TB to an 8TB drive. I’m now waiting for that 8TB drive to either fill up or plateau. If it fills I’ll probably upgrade one of the 2TB drives, if it plateaus I may start up another node. I’m already thinking about putting the now available 1TB drive on a new node just to start the vetting.

Alexey · May 21, 2022, 6:31am

this is cheating, not benefit. The vetting process have been designed to test your setup for reliability and to do not lost a noticeable amount customers’ data if your hardware fail in the beginning. Because hardware or software problems usually discovered at the beginning.
When you migrate such a pre-vetted node to the actual hardware it could fail and lose noticeable amount of customers’ data. So this method is not good for the network.

@cpare You can run several nodes, but as pointed out - they will not get more data than only one node if all of them in the same /24 subnet of public IPs - we want to be decentralized as much as possible. If you would run them on the same HDD (which is against Node Operator Terms & Conditions by the way), these nodes will affect each other - they will try to use the same resource (HDD) in the same time and as result they will loose the race more often than just a one node on the same HDD.
The one of the benefits in multiple nodes in the same /24 subnet of public IPs, that you can run them gradually - when the first node is almost full, you will start a next one and it will accept full ingress traffic, when the first node will be full. This way you can extend your storage.
Or you can start the next node when the previous is vetted, to spread the load (useful when you have SMR drives).
The other benefit - when you run several nodes (each on own HDD), if one HDD would fail, you will lose only this part of your common data, not all as in case of the one node and big storage.

Toyoo · May 21, 2022, 11:23am

@cpare, there are reasons to run multiple nodes on the same hardware, but filling space faster is not one of them.

So is operating more than one node on a single IP address. Yet this practice has been recommended many times by Storj here on the forum to utilize many drives. Other conditions are also quite outdated, like the 99.3% online requirement. Why would SNOs take the ToS seriously?

Stob · May 21, 2022, 11:43am

Whilst I don’t disagree with your sentiment, it’s totally within Storj’s control to police the nodes and SNOs, including the terms and conditions, as they see fit. Initially downtime/offline suspension wasn’t enforced but in March 2021 it was enabled for all nodes. Similarly the startup check for the amount of free disk space didn’t exist and now a node won’t start without enough allocated and free space. It’s not beyond the realm of possibility for Storj to enable some kind of check so only one node is running from a single disk, if that setup was harming the network (I know there could be workarounds and it’s not currently a priority for them).

andrew2.hart · May 21, 2022, 1:17pm

No you may not: “Operate more than one (1) Storage Node behind the same IP address”
Does this still apply?

Craig · May 21, 2022, 2:52pm

If we were to take this as stated, then there should not be anyone migrating node data to different hardware. Yet Storj openly provides the instruction on how to do so?

Toyoo · May 21, 2022, 8:11pm

It’s one thing to state that some limit is not enforced yet—which is what happened before March 2021—another to explicitly encourage a specific practice that is against ToS.

skookum · May 21, 2022, 11:41pm

To add another perspective, I run my node on ZFS datasets. Using the same hardware that has been through the vetting process I can increase the free space available to storj through a simple setting.

Alexey · May 22, 2022, 7:36am

Please don’t use the migration guide as an excuse to cheat the vetting system.
The Migration Guide to Other Hardware (the disk in particular) was originally designed for those who have a full disk and would like to expand the space provided, or to migrate to less power-hungry hardware like the Raspberry Pi.
Of course, this guide can be used not only for the reasons mentioned, we do not limit it.
However, what you are suggesting is an option to bypass the vetting system to reduce the amount withheld and get the node up and running immediately without vetting later on. This is does not have a high priority right now, moreover there is a probability that we may change the vetting system, at least it’s discussed.

So the bypassing vetting system should not be a reason to run multiple nodes, especially on the same disk.

I can cite only this:

This is still a requirement, however the penalty is not so strict as was before.
Now your node will not be disqualified immediately as it fail to commit to this requirement. But there is other penalties as well, like suspension and losing data while your node were offline.

And I agree that our protocol should be updated to make breaking of other aspects of ToS economically not viable, like it was done for restriction for 1 node per 1 IP or all nodes should use the same wallet.

I propose to stop here, and not start a new circle of discussion of the same and return to the original topic - which is better a lot of 2TB nodes or one big one.

Pentium100 · May 23, 2022, 8:43am

Back to the original topic, here’s my take on it:
Whether you have one big node or a few small ones, you will get the same amount of data on them, but there are other differences. I actually have a single large node (21TB used), but I can see why I would want to have many small ones instead.

Why would I want to have multiple smaller nodes:

No need for a large upfront investment. I could buy new drives as my nodes fill up.
If one node dies, it does not hurt as much as a single large node dying.
Because of 2, I could risk and use the drives without RAID, minimizing costs.
Lower hardware requirements for a small node, could result in lower power consumption.

Why would I want to have a single big node:

Easier to monitor and update one node instead of 10.
I can use RAID to have higher reliability.
I can use the same server for other things.
Easier for physical space - a single 4U server can be put in a rack or on a shelf, connect a couple of power cables to it, LAN cable and I’m done. Placing and wiring up 10 rasbperries would be more annoying.

Alexey · May 23, 2022, 6:35pm

2 posts were split to a new topic: Many nodes on the same HDD

Pac · May 23, 2022, 10:33pm

I would also add that usually a single big HDD consumes less power per TB than many small disks. But that applies when the disk is full though, which could take years…^^’