5 nodes on the same HDD vs 5 nodes on a separate disks

humbfig · November 18, 2022, 8:19pm

Well, I don’t think it makes sense to start a node on a 20TB disk and run it until it dies. By the time the disk fills up and you’re recovering your big 20TB investment, the disk is already a walking dead. Suddenly, all gone with the wind. It would make more sense to fill up several small disks and when you buy the big one, you move all your nodes to it and immediately fill it up.
Anyway you do it, It seems to me that it’s preferable to run several small nodes instead of a big one, even if you’re just running 1 single disk. The idea is to keep your nodes manageable. Nobody knows what the future will bring. When you have just one big node you run out of options and you might be forced to do exactly what you described, to run the big node until the disk crashes. You loose everything and Storj wastes a lot of money on repair. Keep the nodes size manageable and you have options. You can move them around, you can change your strategy (you never know when Storj will grow very fast or when it will idle) along the way and you might just avoid ever loosing a big chunk of data.

stuberman · November 19, 2022, 12:52am

I think you miss my point. If Storj wants to store data at scale (EiB) then hobbyists will not be enough to create the supply (capacity) needed. My example is for competitive commercial operations that expect to make a profit - primarily in response to @Knowledge’s comment. Storj would need to charge more than $5/TiB/month to compensate commercial providers.

Currently I am at 20 TiB on three HDDs (nodes) with plenty of room to spare, it is like watching water drip. In my case, I am not motivated by profit, since this is hobby machine for me.

BrightSilence · November 19, 2022, 12:59am

Or those operations would need to run different from traditional setups. I’m thinking an automated setup with large racks running low power and low performance servers with as many HDD bays as possible. Plugging in a new HDD would automatically assign a new /24 range IP and start a node. You’d only have to monitor for HDD failure and replace any HDD that dies. No RAID setups, no space loss as a result. The only thing you need to beef up is network and redundant power, which can be done at scale for relatively cheap. And for the most part a single person could run that entire data center.

All other stuff like certification, legal and other costs would not scale with capacity, so should be relatively affordable at scale. Just adjust operations to the requirements instead of running this as a traditional data center. You could do way better than $5 per TB if you make it purpose built.

Alexey · November 19, 2022, 7:31am

This is against Node Operator Terms & Conditions, you should not use the same disk for more than a one node.
And also they will affect each other to offer a good service to the customers. Since all nodes behind /24 subnet of Public IPs are treated as a one big node, it makes no sense to move all of them to the one disk. You may start a one big node in this case, because if this disk could die, all nodes will die too.

humbfig · November 19, 2022, 11:48am

My first node run 21 months without competition and got up to 6TB. If I add the competition, I now have 7.5 TB after 23 months. At this rate, I would fill a 20TB disk in 5 years. I would be running a full 20TB disk precisely by the end of the warranty. I would spend 5 years wearing down a 20TB disk while being far from using the capacity I paid for (on average, I paid for 20TB and I used half until the warranty expiration). This is why I say it doesn’t make sense. You pay for the whole 20TB but you won’t be using it for most of its life. After 5 years (warranty) I have an unmanageable large node and I’m waiting for it to crash. With a unique IP/24, this is about all I can have (your calculation!). When it crashes I’m at ground zero.
It doesn’t mean it makes sense to buy 20 1TB disks, fill them up, then buy a 20TB disk and move the 20 nodes into the 20TB disk. There is a sweet spot somewhere in between.

Anyway, I didn’t buy any disks for Storj. I’m just thinking about what I would do if I had no hardware and wanted to start a node. I would not buy a 20TB disk…

I buy the same brand I buy cars, “Toyota”.

Vadim · November 19, 2022, 11:57am

no need to invest money for 20 tb to new node. I used old 4tb hdds, and now slowly change them to bigger ones. My ald used hdd each made me 1000$ incom already, so I invest to new hdd, copy and i have 4tb in the beginning, filling rates are now rising like hell.
If your old hdd is good enouth, you can start new 4tb node, and wait if fill also, then in future change to bigger.

humbfig · November 19, 2022, 12:20pm

What difference does it make to the network? Two disks on 2 USB’s of a RPi or 2 nodes in a single disk of a RPi?

Obviously all nodes will die. But that doesn’t make any difference to the network. The same amount of TB’s die, whether it’s 1 node or several nodes.

But let’s simplify and say a disk dies the minute it reaches the end of the warranty and let’s say the warranty is 5 years.

A node started in a 20TB disk will fill up in 5 years and die. The network looses 20TB. The node gave the network an average of 10TB for 5 years.
A 20TB disk is filled with 5 4TB nodes. After 5 years the disk dies. The network looses the same 20TB. The disk (5 nodes) gave the network an average of 20TB for 5 years.

Which one is better for the network?

Moreover, a node with 20TB is unmanageable. There’s not much you can do but to watch your disk die and the network is certain to loose 20TB.
5 4TB nodes in a single disk are still manageable. When you approach the 5 year certain death, you can still move the nodes to another disk or disks. The network looses 0TB.

Again, which one is better for the network?
Which one is better for the SNO?

humbfig · November 19, 2022, 12:25pm

I’d say that is a perfect strategy… while respecting the rule “no more than one node per disk”.

Vadim · November 19, 2022, 12:34pm

it just a life technical advice. Storagenode is a bunch of small files, to work with it need a lot of IOPS, and if you have 2 node on one disk then IOPS will be the bottleneck

but there is a rule also * 4.1.4.1. Have a minimum of one (1) hard drive and one (1) processor core dedicated to each Storage Node;

Pentium100 · November 19, 2022, 12:37pm

Buy a new drive and copy the data to it. 20TB is 20TB, whether it is a single big node or 4 smaller nodes on the same disk.

Alexey · November 19, 2022, 12:39pm

it’s against Node Operator Terms & Conditions and will affects each other if both nodes will store data on the same disk.
From the network perspective it guarantees to fall both nodes offline in the same time, so customer’s data is in danger.
If we would lost of customers’ files - there will be no data for storage node operators too. So it’s better to setup them separately.

Yes. But all stat is based on uncorrelated failures, which is not the case when you use not only the same hardware and the same location, but also the same disk, which ruin all estimations.
Of course, the network will deal with it. But if all operators will follow your bad advise, the network is in trouble.

BrightSilence · November 19, 2022, 12:41pm

If only life were that certain. Chances are you’re using that HDD for 10 years. I’m not moving anything until that HDD actually dies. And I don’t have any illusions that I will be able to copy all that data without issues when the HDD starts showing signs of dying. That’s highly unlikely whether it’s 1 or 5 nodes on that HDD.

Pentium100 · November 19, 2022, 12:42pm

As I understand, multiple nodes with the same IP are treated as one big node and not given more than one piece of a file in total. So, what is the difference?

There would be a difference if those nodes had different IPs (different /24).

Alexey · November 19, 2022, 12:44pm

The difference, that probability to lose two disks at once at least in two times lower.
Of course your motherboard could die too and in this case not so much difference, but what is probability?

Vadim · November 19, 2022, 12:44pm

Sorry my mistake, there a rule for this

4.1.4.1. Have a minimum of one (1) hard drive and one (1) processor core dedicated to each Storage Node;

Vadim · November 19, 2022, 12:47pm

If motherboart is dead, it not mean nodes are dead, because if change motherboard, nodes can restor in most cases.

BrightSilence · November 19, 2022, 12:49pm

Yeah well… So is hosting more than one node behind a single IP. When it’s not updated for years and Storj Labs has changed their attitude towards many of the things stated, it’s hard to know which parts to take seriously and we’re kind of left to judge for ourselves which parts we should follow. This update has been long promised but never happened.

Those stats don’t account for risk of correlated failure of nodes which may contain multiple pieces for the same segment. If nodes use the same IP, that will never happen. So running multiple nodes on an HDD will not be any different from a risk of failure perspective unless you somehow use multiple IP /24 ranges as well.

Of course the impact on performance is still valid.

Pentium100 · November 19, 2022, 12:49pm

From the network perspective - 1x20TB node vs 20x1TB nodes with the same public IP, what is the difference?

Would the 20x1TB nodes get more data (multiple pieces of the same file?) that would make it more dangerous for customer data if all of those nodes went down at the same time?

I has been stated repeatedly that the network considers all nodes with the same /24 as “one big node” and does not give more than one piece of the file to all of them. Was that incorrect?

Alexey · November 19, 2022, 12:52pm

no, they will be the same. At least if they are in the same /24 subnet of Public IPs.
However, I believe that different HDDs are more reliable, than a single one.

Pentium100 · November 19, 2022, 12:54pm

I do not dispute that, but the original situation was 4x5TB nodes vs 1x20TB node on a single 20TB drive. Is there a difference? Assuming the same public IP.