Question of fue hd

BrightSilence · February 27, 2020, 3:21pm

Yeah, apart from the vetting argument, none of that mattered because it’s ALREADY FACTORED IN for the 2% number. I don’t care about rated numbers if we have real world data for similar use. And either way your warranty will be honored anyway.

As for vetting, I would say you can spin up the second node anywhere between the first node being vetted and 75% full. You probably don’t want to go completely over board and run 10 nodes from day one because you’d be wasting power more than anything, but there is no problem spinning up 2 or 3 nodes before they fill up if you want to spread the risk a little.

anon27637763 · February 27, 2020, 3:36pm

This completely misses the reality that Storj network traffic is inconsistent and unpredictable. Staggered started nodes are not guaranteed to catch the spurious lost traffic due to any given node being DQ-ed for any reason… failed drive or otherwise. And each new node exists in a different network environment… meaning that node 1 might be the only node within a given /24 subnet … but maybe by month 3, there are 2 more operators in that same subnet. Therefore, your node 2, is now competing for traffic with your node 1 as well as other operator nodes.

There’s really no reason to run more than a single node on the same subnet, unless your current one is filled… There will be no extra payout. And if staggered nodes are started rather than waiting for node 1 to fill, the risk of payout loss is decreased moderately…

However, why do that at all? The first node isn’t filled anyway and is unlikely to be filled anytime is the near future. So, why not just run the second drive in a RAID configuration, since that would ensure more reliability and lower maintenance complexity.

If the Storj network traffic was such that moderately allocated nodes filled very fast, the single drive per node would make more sense… but only if the node operator was deploying WD-Red drives. But, this is not my experience, nor seemingly any other SNO I’ve seen post here. Certainly, any node running standard issue consumer drives in single drive per node configuration is operating at the extreme end of the performance curve.

anon27637763 · February 27, 2020, 3:53pm

No. That’s not a correct statement.

The 2% failure rate is a statistical measure across a population of drives operating within the expected specified design environment.

Running a consumer level HDD as a Storj drive places that drive on the edge or beyond the expected design environment. Therefore, the 2% claimed failure rate is not applicable.

BrightSilence · February 27, 2020, 4:07pm

Do you seriously think Backblaze is turning these drives off for most of the day? You’re just wrong… I can’t spin it any other way. They have extensive writing about the types of systems these disks are used in and they are 24/7 and used for cloud storage. The use is nearly identical to the kind of use you would expect for Storj.

Pentium100 · February 27, 2020, 4:22pm

If there are more than one operator in the same /24, then running multiple nodes will get you more data.

Example: my there is one other operator in the same /24 as me. We both run one node each. Our nodes get 50% of the traffic. I start two more nodes, now there are 4 nodes total, each gets 25% of the traffic, but since 3 of those nodes are mine, I get 75% of the combined traffic.

It’s unpredictable now, but if a lot of people start using it the traffic will have patterns as thousands of people would not cooperate to keep the traffic unpredictable. Of course, right now the biggest traffic generator is Storj itself, so the traffic can be made unpredictable.

anon27637763 · February 27, 2020, 4:34pm

What’s the statistical swing of that 2% failure rate testing on Backblaze’s 0.0673 % of shipped HDDs in 3rd Quarter 2019?

anon27637763 · February 27, 2020, 4:37pm

Sounds right for precisely balanced nodes, but I haven’t tested it…

EDIT:

Now that I think about it… the statement could only be true for fully vetted nodes. Since more nodes will result in slower vetting, spinning up 2 extra nodes in order to accumulate a larger percentage of the offered traffic will require a long lead time before the excess traffic will result in excess payout. And each newly started node will be required to run at least long enough to have the early escrow paid out. Therefore, the above RAID vs. Single drive argument still applies… but then one is running multiple nodes each with its own RAID… I can’t see that paying out very well due to the increased electricity cost.

But… as I wrote… I haven’t tested it. Are there any node operators who have? It would be fascinating to see if the expected traffic increase occurred.

BrightSilence · February 27, 2020, 5:14pm

Dude, stop reaching. Look at the numbers they have reported over the past few years if you really want to know. If you have a better larger sample, I’m all for looking at it. Their numbers are mostly consistent and any inconsistencies are explained. 2019 was actually a relatively bad year due to some firmware issues they had. Even still the numbers went from 1.7% to 1.9%. I’m already rounding to 2% for no good reason except to just make it simpler to calculate with. If you have better data, bring it on. But your attacks on this sample set have no basis in reality. You just want it to be wrong.

anon27637763 · February 27, 2020, 5:23pm

I’m not reaching…

If there are half a billion drives shipped in 3rd quarter 2019, and BackBlaze says that of their 120,000 drives 2% failed, what is the standard error in the half billion population of drives?

The reported 2% failure rate has a large standard error when applied across the entire population of drives. Therefore, the 2% number is not meaningful to determine the expected failure rates in the general population of drives. The BackBlaze sample size is too small to make a claim of 2% precision across the population of drives. Of course, the sample size is larger than other experiments… but that doesn’t mean anything. The sample size is still too small.

Due to the unavailability of any real world test data with large enough sample size to make precise claims of failure rates in in the single digit percent… The hard drive manufacturers datasheet is the best resource for calculating expected drive failure.

BrightSilence · February 27, 2020, 5:55pm

Yes, you are definitely reaching. Latest numbers I can find say that production of hdds is around 330 million yearly.

Please note I even went as far as to go with a 99% confidence level.
And also note that if you have a 120000 sample size the population size really doesn’t matter anymore. You can add tons of additional 0’s at the end there and the confidence interval doesn’t change at all.

You got anything else or can we stop this nonsense now? The numbers are perfectly fine and the sample size is actually amazing for this kind of test.

anon27637763 · February 27, 2020, 6:14pm

And which drives exactly are in the sample size of which we are speaking?

Make and model and size?

Let’s go to the actual raw data. Backblaze helpfully posts the raw data. This is a very good thing, and I am quite impressed that they provide it.

But let’s take a peek at a single day in Nov 2019:

grep -nc "" 2019-11-19.csv
121295

Sample size = 121295-1 = 121294 drives

12 TB Seagate Drives

grep -ncE "ST12000" 2019-11-19.csv
40710

8 TB Seagate drives:

grep -ncE "ST800" 2019-11-19.csv
24223

4 TB Hitachi Drives

grep -ncE "HGST\ HMS5C404" 2019-11-19.csv
15528

Total drives of these 2 brands and 3 types:

24223 + 40710 + 15528 = 80461

Percent of total:

80461 / 121294 = 66.335 %

So, if a Storj node operator is running a 12 or 8 TB Seagate drive or a 4TB Hitachi drive, the BackBlaze data may be at least a little useful.

How many Seagate and Hitachi drives are there in the list?

grep -ncE "ST[0-9]{3}" 2019-11-19.csv
87123

grep -ncE "HGST" 2019-11-19.csv
29009

87123 + 29009 = 116132

Percent of total drives that are either Hitachi or Seagate:

116132 / 121294 = 95.744 %

Therefore, if a Storage Node operator is running a node using HD-Red drives, the BackBlaze statistics provide zero insight into possible drive realiability.

BrightSilence · February 27, 2020, 6:19pm

Great argument if you’re suggesting red is less reliable… Try and see how that argument goes. You’re all over the place. The best evidence we have says it’s below 2%. I’m not arguing with you anymore unless you show any evidence that backs up your claims that other hdds are worse. Any significant evidence at all. It doesn’t even have to be anything close to 120k hdds. I’ll take anything over 10000. Good luck!

anon27637763 · February 27, 2020, 6:25pm

Let’s look a little deeper at the BackBlaze data and find out precisely what kind of drives are under test…

Let’s do Seagate 12 TB first… because those drives around about 33% of the sample size.

All Seagate 12 TB drives in the sample:

grep -ncE "ST12000" 2019-11-19.csv
40710

ST12000NM0007 or …8 drives

grep -ncE "ST12000NM000[7-8]{1}" 2019-11-19.csv
40710

And what is a ST12000NM0007 or …8 drive anyway?

Oh… It’s an Enterprise Level Seagate Exos

OK… So now we know that a huge portion of the BackBlaze stats are based on a single manufacturer’s Enterprise Level product…

Well, sure a 2% failure rate might accurately represent that market.

I’m not arguing with you anymore unless you show any evidence that backs up your claims that other hdds are worse.

You asked me to click the link and look at the data. I did.

And the data clearly show that the BackBlaze reported failure rates include at least 33% enterprise level harddrives of a single manufacturer and even a single model from that manufacturer. Therefore, any unbiased observer will obviously see that the BackBlaze data is of limited use when discussing consumer hardware running at enterprise level specifications.

I don’t need to provide any data showing that consumer hardware would operate less reliability. It’s enough to know that the enterprise level drives will necessarily cause the failure rates of the BackBlaze tests to be much lower than consumer based hardware.

BrightSilence · February 27, 2020, 6:45pm

So no, you’re not going to provide evidence of any higher failure rates anywhere? Ok then, as announced, I’m done.

Ps. The HDD you point out has the worst failure rate, so your argument doesn’t actually work for you, but lets just ignore that part.

anon27637763 · February 27, 2020, 6:54pm

The November BackBlaze data also have a large chunk of Seagate 4TB desktop consumer drives…

Guess which drives are denoted as failed in the month of Nov 2019?

If you guessed the Seagate Desktop drives, you would be correct.

Therefore, the BackBlaze supports my argument precisely. Drive failure for consumer level HDDs is a concern and is higher than the overall rollup failure rate posted by BackBlaze… even within the single manufacturer… Seagate.

BrightSilence · February 27, 2020, 7:13pm

That is rich… as they literally wrote an article about the fact that the failure rates are the same.

Scroll down a bit.

Want newer stats… Sure. That consumer HDD you’re talking about. Exactly 2% and the worst consumer HDD of the bunch.

Every time you make an argument it just doesn’t add up. There is literally only one model with a higher than 2% failure rate and it’s the Enterprise drive you mentioned. There was a firmware issue with that drive which backblaze has written about extensively. If you look at consumer drives only the percentage is actually much closer to 1% than 2%.

And yet, you’ve still not provided a single piece of evidence for your claim that failure rates are actually worse…

anon27637763 · February 27, 2020, 7:18pm

Most of the above drives are Enterprise level drives. And there are only two of 6 of the most popular manufacturers listed. How is the above list of drives a representative sample of drives likely to be used in a Storj Node?

OK… Let’s do a poll…

Which drive listed in the above stats does anyone here have running in their Storj Node?

My answer: None.

BrightSilence · February 27, 2020, 7:23pm

No dude… Show any statistics of drives that are worse first. Then we can go on.

anon27637763 · February 27, 2020, 7:27pm

The BackBlaze data is simply not applicable to the argument. It’s that simple.

It is interesting… and it’s also interesting that as the sample size of a given make and model drive increases in the general population of drives that that make and model’s failure rate tends to head towards 2 to 3% … However, it is invalid to make the argument that drive manufacturers as well as make and model of drives are interchangeable statistically speaking.

In short, the BackBlaze sample size is not representative of a randomly distributed set of consumer hardware. Since this is the indisputable reality… the BackBlaze data are not applicable.

QED.

BrightSilence · February 27, 2020, 7:33pm

QED comes at the end of a proof. Not a rambling set of unsupported statements. But congrats! On the win anyway. It’s all yours.