Storage Node with multiple HDs

Jov · January 17, 2020, 10:44am

Hi together

First of all i love this project and i think it’s one of the most interesting use cases for blockchain.

The Storj Team recommends that you should make a new node for every HD. One of my problems is that i have a lot of HDs under 500Gb that i would like to use aswell.
What i don’t understand is why they are not accepting that you just mount more than just one directory to your node. For sure your node would loose reputation if one HD fails but at least you have a longer runtime instead of just configure a new node everytime your space is full and the reputation process begins from start with every new node. Like that you also are not loosing the whole hold back amount if a HD fails and instead you loose only the exact amount the system needs to repair the data from the failed drive.
Wouldn’t that be a better solution?
I’m quite interested what you people think about that.

Odmin · January 17, 2020, 10:47am

I have another opinion that I described here.

Jov · January 17, 2020, 11:03am

Thx for the input. But i thought it doesn’t make a lot of sense to use a raid because you loose space that could contribute to the network.

Odmin · January 17, 2020, 11:08am

Yes, it true.
But what happened if you loose one node? You can prepare a list of answers for yourself and make a decision about RAID.
I just share my opinion from SNO point of view.

champmine18 · January 17, 2020, 1:03pm

I agree, if you have plenty of disks, why not have raid5 or raid6 (software ok); if you have 8 disks you can spare 1-2 as spare disks. Plus with raid you get the benefit of higher iops compared to a single disk, which eventually increases your successful download rate.

deathlessdd · January 17, 2020, 1:26pm

I think the main problem here is that raid is far better, But the entire idea of Storj is using hard drive space you dont use. Which would give everyone in the world a chance to be apart of it. People around the world arent going to have the funds to go out and buy a bunch of hard drives if the whole idea is to use “space you dont use in your pc” More and more people are going to go rent VPS in datacenters creating a centralize cloud storage instead of decentralize storage.

Vadim · January 17, 2020, 1:34pm

it may be not very good, but you can put HDDs to spanned or striped partition, but if 1 die then node gone

BrightSilence · January 17, 2020, 3:48pm

If the individual drives are too small to run a node, I can see why combining them in a RAID may be the better option. If you’re doing that though, I definitely recommend adding some redundancy. Depending on how many disks you have RAID6 would be preferred, but with 6 or fewer I’d say RAID5 is acceptable. It’s all or nothing with this setup though. If you lose data, your entire node is gone.

The reason I would normally recommend running separate nodes is because you make MORE money until the first HDD failure. After that you make the same amount of money as you would have with a RAID setup. Basically, a failure only makes you lose your added income to begin with. But if your HDD’s individually aren’t large enough, you have little choice but to use RAID.

I absolutely disagree that RAID is a must as @Odmin’s page suggests. In fact in most cases it’s a worse option. We can argue about the better approach, but saying it’s a must is just plain false.

Edit: I went into the details of this a while back in the following post

Odmin · January 17, 2020, 4:30pm

I think you skipped “Introduction” section:

Can I ask you about your Synology NAS? what about your disk system, are you on a single disk?

Also, if you lose one node with one disk and start from scratch another node, you lose you reputation and deposit. Another node also will start of 25% payout instead of continue already earned 50%-75%-100% payout. So, from SNO point of view - much better do not loose node; from Storj network point of view, better fresh nodes with single disk (payouts only 25%)

Anyone can make a simple excel table with calculations and graphs and make a decision, which will be better for yourself.

I pay you attention for this blog post:

Ask himself (like SNO):

Are your node can survive 15 months with a single disk? or you need to add some redundancy?
Are you able to lose your already earned deposit and reputation?
Are you able to lose 75%-100% payouts and starting from scratch with 25% payouts?
Are you able expand your storage node disk space and continue with 100% payouts or you would like start a new node with 25% payouts?
Are you have a plan be long term SNO with reliable storage node or not?

It a key questions for any SNO, and answers will put you in the right direction.

deathlessdd · January 17, 2020, 4:37pm

I do agree with you But to have datacenter hardware you need a very high amount of over head to even start. You will never make the amount of money back that you put into building a high end storagenode.

Odmin · January 17, 2020, 4:39pm

You can use your OLD hardware, where are you read that I recommend data center hardware?

deathlessdd · January 17, 2020, 4:41pm

Having redundancy is pretty much datacenter hardware, You may not have said it, But who has a bunch of hard drives that are made for this kinda wear and tear laying around. Having hardware raid card cause you cant depend on software. Its still cheaper to buy one hard drive and if the node fails then you just start over.

BrightSilence · January 17, 2020, 4:51pm

I think you’re trying to back up one assertion with another. I’d argue most SNOs want to optimize profits and loss of escrow is only one element of that. You would only lose escrow one the one node that failed if you run one node per disk. And that would be a node representing disk space you wouldn’t even be making profits on if you had used that disk for redundancy. It would take at least 2 failures for you to start making less money, despite the loss of some of the escrowed money.

It seems you already know I’m using RAID. If you had followed the link I posted in my previous post you see that I added an exception. If you are using free space on an array you already have, using RAID is obviously fine. I would add that if you have the choice of running a separate disk vs adding a disk to a RAID 5 or 6 array, adding it to the array is probably the better option as you’re not wasting any additional disk space on redundancy. For most people this is not a setup they already have. And those that do probably don’t need my advise to make this decision anyway.

Again I point to the link I just posted as I did make the relevant calculations there, which include cost of loss of escrowed money as well as taking a generous 2 year time frame for a node to rebuild reputation. If your goal is profit maximization the math still doesn’t add up for a RAID setup unless you already have an array or the separate disks are too small.

Odmin · January 17, 2020, 5:04pm

Thanks for sharing your opinion, I respect it.
My main goal: build highly reliable storage nodes for long-term cooperation and not lose reputation and money. I, of course, I will be use redundancy to achieve this goal instead of use a lot of hardware for connecting single disks.

Vadim · January 17, 2020, 5:39pm

RAID also can go crasy, seen this lot of times.

eagleye · January 17, 2020, 6:47pm

I think the STORJ idea is to maximize hard drive space and minimize nodes.
STORJ has built in redundancy. Nodes have a choice.

Nodes can choose single drive, RAID, SPAN OR STRIPED drives.
First choice for node reliability is a multi-drive RAID setup that gives you backup redundancy and recovery in case a drive crashes.

Next I think SPANNED or STRIPED drives put together would work fine for small drives under 1 TB. once you go higher and are storing more data than the loss of the node will be more costly when a drive fails and you cannot recover the data.

To run multiple nodes from the same IP will limit data sent by the satellites, from what I understand. Maybe that is better for small drive nodes, in case one drive goes down, that node loss will be little base on small amounts of stored data.

I have yet to see and think there should be a performance table expected from node operation. For example a 10/100 Mbps ethernet line can only handle 1 node. But a 1 Gpbs/ 1Gbps ethernet could handle 2-10 nodes. All for best node performance.

My 10/100 ethernet can only handle 1 node. Uploads are optimal and very successful, but downloads are not as they max out at 10 Mbps. For that reason I have a higher context cancelled on downloads.

cdhowie · January 17, 2020, 9:08pm

I have exactly the opposite experience. Never observed any issues with md-raid on Linux. It’s always been rock-solid for me. Have had multiple hardware RAID controllers inexplicably die and take drives with them.

Vadim · January 17, 2020, 9:11pm

Hardware raid is not relible without battery. And battery die always unexpectandly. And not live long.

Odmin · January 17, 2020, 9:14pm

Good Raid controller that can make RAID 5,6 with acceptable IOPS is too expensive and can be cost like all your hardware for storage node

Vadim · January 17, 2020, 9:16pm

may be is better buy relible HDD? then invest to relible RAID