Do I need a raid or mirrored array

Dazisgud · October 31, 2019, 5:00pm

Or does Storj back up everything?

nerdatwork · October 31, 2019, 5:12pm

You need to make sure data stored on your drive is accessible and safe from any failures. It is recommended to use raid-less setup with 1 HDD 1 node. If you lose data you fail audits and your rep takes a hit. Can you explain what you mean by “storj back up everything”?

Dazisgud · October 31, 2019, 5:29pm

Let me ask this. If my HD fails and I lose all the data, does this hurt my rep? And if it hurts my rep then how should I mitigate the risk of that happening?

anon27637763 · October 31, 2019, 5:47pm

If your HDD fails and you lose all the data, your node will be disqualified. If disqualification occurs, you can sign up for a new ID. There is a vetting process for nodes that consists of 100 audits. This number is listed somewhere in the forum… I can find a link if needed…

So, let’s say you are running a node with a single HDD for about 8 months. In that 8 months, your drive becomes nearly full. Then the drive fails… and your node becomes disqualified.

You sign up for a new ID, and go through the 100 audits and start plugging away again.

However, you’ve now reset the clock for the escrow payments. And it will be another 10 months before your node receives full payout.

This is why, I strongly refute the idea of 1 HDD per 1 node. It’s simply handing the Storj network your hardware for cheap use until it fills up and dies. I strongly recommend a setup of 3 drives and RAID 5 … or similar. Such setup will allow you to replace failed drives without becoming disqualified for lost data. Thus, your node ID will have a good chance at passing the 10 month escrow period to become a long lived node with full payout on a monthly basis.

Alexey · October 31, 2019, 10:52pm

Please, do not use the RAID5 on big disks

anon27637763 · October 31, 2019, 11:53pm

It all depends on your specific configuration…

I’m running my own node on a test server which hosts several other experiments.

Dual Xeon, 62 GB RAM… with a bunch of SAS drives configured on a LSI hardware RAID controller.

Dazisgud · November 1, 2019, 1:47am

Ok so the ideal situation is raid 6 or 10 with enterprise drives. Or does it make much difference if you have sata drives on raid 6,10?

Alexey · November 1, 2019, 7:54am

RAID10 is faster, but you waste half of space to redundancy.

BrightSilence · November 1, 2019, 8:59am

The ideal situation is still one node per disk. With annual failure rates for HDDs down to 2% these days if you have 3 HDDs, you’re much better off sharing all that space on 3 nodes and running the risk of one of them failing. You’ll have more space to share until it does and more money to make on the same hardware.

The only scenario where I would suggest using RAID is if you are using spare room on an already existing RAID array. Otherwise you’re just wasting room to redundancy that you can be using to make money.

Getting back to the initial question. Storj uses erasure codes to split up the files and make pieces redundant. Out of 85 pieces initially uploaded only 29 are needed to recreate the file and a repair is triggered on the network when the number of pieces drops to 35 or lower. So there is no need to have redundancy on the storagenode itself. Nor is it more profitable in the long run for the node operator. So I’d avoid it.

anon27637763 · November 1, 2019, 10:48am

I strongly disagree.

This analysis does not take into consideration the ramp up time for accumulating data on a given drive as well as the very long 10 month escrow policy. If a node runs for 6 months and then dies and gets disqualified then that node operator loses 6 months of time to escrow.

Since the first 3 months are only 25% payout, followed by 50% payout for 3 more months, a drive failure resulting in data loss means a significant loss of future income based on the lost reputation stored in escrow. Furthermore, any new node replacing the lost node will start with zero data and need to accumulate data for a long time to reach the same point as the lost drive.

So, initially, running 3 nodes one drive per node will yield a larger payout. However, any future drive failure leading to data loss and node disqualification will result in a very large percentage loss of time and income for 10 months.

Furthermore, data accumulation is slower as the geographical distance increases between a satellite and a given storage node… and the loss of a long running storage node may not be immediately fixable by simply starting up a new one since the number of storage nodes may have increased in the subnet.

The idea that one drive per node and running multiple nodes is a good idea is not necessarily the case depending on the goals of the SNO as well as the specific details regarding the SNs network and geographical positions.

BrightSilence · November 1, 2019, 12:08pm

Lets translate that risk into average costs and compare.

Let me start by linking to a source for annualized failure rates as I don’t want to be accused of pulling numbers out of my ***.

They have an average of 1.8%, but lets go with 2% to make it easier.

The first year you’ll have indeed 3 months of 25% payout, 3 months of 50%, 3 months of 75% and 3 months of 100%. But lets say that on average during the first year you have a loss of 50% of earnings if your node fails. It’ll be more if it fails earlier in the year, less if it fails later in the year. During the next year 50% of the amount held in escrow is paid back to you, so that percentage drops to 25% of the income during the first year.

Scenario 1: 1 node on RAID5

Lets assume perfect protection in a RAID5 setup and a 0% failure rate. This setup can provide 2/3rds of storage capacity to the network and gets the theoretical 100% of possible income on that 2/3rds.

Scenario 2: 3 nodes for 3 disks

2% (failure risk) * 50% (loss when node fails) = 1% loss of total income on average in the first year. Because of partial payout of escrow that risk drops to a loss of 0.5% of the income this node made in the first year per year.
Of course if you do lose a node, you have to collect new data again. It’s going to be hard to predict how long that will take in a production scenario. But lets assume it will take 2 years to get back up to the same level and grows evenly over those 2 years. This means over the first 2 years you basically use a year worth of income for the failed node. This cost can be expressed as 2% (chance of failure) * 100% (loss when node fails) = 2%. If this failure happens during the first 2 years the loss is of course lower, but lets ignore that. The total loss for escrow as well as having to collect new data comes down to about 3% on average per year.
The upside is that you are able to share 100% of your drive capacity with the network.

Conclusion

Scenario 1 is indeed the better option if you already have the hardware laying around and you never get more data sent to your node than 2/3rds of the capacity you have available. About 3% more profitable to be exact.
As soon as the network sends your nodes more data than that 2/3rds size (or to be more exact 103% of that 2/3rds size) scenario 2 becomes more profitable. How much more profitable? Between 0-30% more profitable depending on how much of your storage will be filled. 30% when all space is used.
I’ll take the tiny risk of node loss now over the certainty of significant loss of income long term when I simply share less space.

Of course this equation may change if you have more HDD space laying around than the network can reasonably ever fill up. Then by all means use RAID (and avoid RAID5 even as mentioned by @Alexey).

All this assumes you already have 3 HDDs laying around. If you have to justify spending more money to buy that third drive it becomes even harder to argue for a RAID setup because recouping the cost of that additional HDD with the average 3% more income would probably take more than a lifetime.

I think I have been extremely reasonable in this assessment, using higher failure rates than backblaze measured, assuming perfect protection from RAID5, using very long recovery rates for collecting new data on new nodes and still I can’t find good reasons to go with RAID over single disk nodes unless you already have a large amount of large drives laying around. If you disagree, please point out specifically where my calculation is incorrect.

donald.m.motsinger · November 1, 2019, 12:17pm

This is not how statistic works. You assume almost the worst case scenario to come to your conclusion. Assuming the 2% failure rate per year is correct, 99% of hard disks don’t fail within 6 months. This 1% chance of failure doesn’t justify the 100% wasted space. Ok, you can multiply the failure rate by the amount of disks, but you’re still worse off with RAID.

anon27637763 · November 1, 2019, 12:18pm

A given node is very unlikely to accumulate enough data to justify the extra storage space.

My own node has been running for 2 months. As I’ve indicated in other threads, nearly 100% of my income for those two months is bandwidth… not drive space. So far, I’ve been paid $0.02 for my drive space. This month, the estimator says $0.27 … while bandwidth is running about $8.75 …

So, the difference in income for running 3 drives vs. 1 drive in the first months is likely to be on the order of a few cents…

If drive failure is at 2% … and there are 100 SNOs running 3 drives each. 6 drives will likely fail…

That’s 6 SNOs losing money and reputation and 10 months escrow.

I can work through a graphic if you’d like. I need to anyway for my testing.

BrightSilence · November 1, 2019, 12:39pm

If your node stores more data, it will see a similar increase in downloaded data in a production scenario. After all you have to store data in order for it to be downloaded from your node. More pieces stored means more chances to have a piece downloaded. Please note that during current tests, this might not yet be the case as specific test patterns could lead to download patterns that don’t represent real world use.

As @donald.m.motsinger pointed out, you continuously zoom in on the worst case scenario and ignore the chance of that happening. Yes in the scenario you point out 6 SNO’s would lose one third of their escrow, which is already half of their income. So 1/6th of an income loss as the worst case at 2% chance. But you completely ignore the other 94 SNOs who would have seen absolutely 0 benefit of wasting an additional disk.

As for it being unlikely that you will accumulate enough data to fill more than 2/3rd. I don’t know how you would make that assessment at this point. This completely depends on the size of your HDDs and we haven’t seen the network in action with heavy production use yet. That said, my node has at moments stored 5TB and is now holding 3TB already and it’s still early days. if you have 3x4TB drives… I wouldn’t bet on it never going over 8TB.

anon27637763 · November 1, 2019, 12:54pm

Who wants to volunteer to be one of the 6 who loses out?

A recommendation of one drive per node amounts to gambling that the SNO will be lucky for a long time. It is not a long term solution… and doesn’t lend itself to running a profitable service from the SNO’s point of view.

There are two competing viewpoints.

The view of the SNO.
The view of the Satellite Operator.

The SNO wants to get past the 10 month escrow period to reach full payout.

The Satellite Operator is betting that any given SNO … or a decent portion of them… don’t survive through the 10 month escrow period. If a portion of the SNs go offline, the Storj network still doesn’t lose data… and the data stored is cheaper since the SNOs that have been disqualified for failed drives/downtime/etc never reached the 10 month full payout.

So, if an SNO would like a chance at getting through the 10 month escrow period, that SNO should seek to limit the possible failure rate of the SN as much as possible. The one drive per node is like a game of darts… someone’s going to have a drive failure. I don’t want that drive failure to result in letting the Satellite Operator take another 75% … 50% … 25% cut of my bandwidth, electricity, and drive wear.

donald.m.motsinger · November 1, 2019, 1:09pm

Do you ever leave your house? There is a non-zero chance to get hit by a car. I rest my case here…

BrightSilence · November 1, 2019, 1:16pm

Now you are simply dead wrong… There is absolutely no upside to losing nodes frequently to the satellite operators. The most important part is to keep data secure and escrow is used up to execute the needed repairs. Once again you’re not looking at the complete picture. Over time every piece that was lost will need to be repaired. It may take a while until that repair is triggered, but eventually it will be, so those costs will be made eventually anyway.

Is this the question any SNO is facing? NO! The question is whether you want to take a less than 2% risk of that happening, which on average will cost you much less than the alternative in most situations. It’s pointless to continue this same discussion if you refuse to accept the reality that the low chance of this happening should factor into this decision. If you want to be 100% risk averse, that’s totally fine, though not all that rational. And translating that irrational consideration into blanket advise for all SNOs is simply not fair to them.

It seems you’re not necessarily taking issue with my calculations, but you’re just not willing to take any risks at all. That’s a different discussion and not one I’m particularly interested in having. Anyone will have to decide for themselves where they fall on that spectrum. Though there is a statistically more profitable method, some people might prefer to still take the road with the lowest risk instead. So I will rest my case here. The info for anyone to decide is in my earlier calculations.

anon27637763 · November 1, 2019, 1:19pm

I shouldn’t… but too tempting… and it’s Friday:

I’m sure everyone is simply voicing sincere opinions.

anon27637763 · November 1, 2019, 1:30pm

The data received and captured by a particular SN is dependent on the geographical position of that node in relation to a satellite node as well as the speed of the hardware of the node along with the available peak bandwidth of the Internet connection.

So… when discussing SN profit… one must include these variables.

While your particular SN seems to receive a lot of data, my particular node sees much less data. This is due to geo IP … and other particulars with the individual setups. One can not simply make the claim that more storage space equates to more payout. It doesn’t necessarily work that way.

It’s highly possible that any given SNO could have 25 TB of hard drive space but be operating in an area that is too far from any satellite to win the speed test for data. In such a case, the SNO would be much be better served running with less storage space and more reliability.

I believe it is irresponsible to make a blanket claim or recommendation that running one drive per node is the way to go. The other factors need to be listed…

While this discussion may seem duplicative and unnecessarily long, I strongly feel that it is important to hash (get the joke) it all out in order to honestly address the possible limitations and risks any new SNO may face.

BrightSilence · November 1, 2019, 1:47pm

That’s fair. Though going by the current traffic on the network is not necessarily representative for how things will be in the future. To a certain point it’s trying to predict production like network behavior.
I think for people like you, who may be further away from the source of most test data, things will get better when there is more actual customer traffic on the network. Everyone will just get more data from customers uploading from locations near them.
For me it’ll likely get worse as more of the traffic will be originating further away from me.

The reasons I think an advise of one node per HDD is still better for most people is not just this though. There are several.

People shouldn’t spend money on additional hardware to ensure redundancy, as they will never recoup that cost
I don’t want to discourage people with just a single HDD laying around from joining the network and thinking having no redundancy is a problem
Statistically speaking the loss of income as a result of risk of failure is very low
You’d be able to share more room and eventually make more money

I believe I also outlined situations where RAID in my previous post might be better. I just think they are much less common. That said though. I already had a NAS with a large RAID6 array in place and I am personally running my node on the spare space on that array. I don’t deny there are no exceptions, but I would honestly run multiple nodes on multiple disks if I didn’t have this large array already as I truly believe that is the better way to go when starting fresh.