Question of fue hd

No it’s not. You’re jumping to the worst case, so I’m jumping to the chance of that worst case happening based on the data we have. It’s completely fair.

I’m granting you a lot of stuff in may calculations. Annual failure rates are actually lower than the 2% I mentioned. Latest numbers show 1.89%. Instead of going down to 0.15% I gave you 0.2% monthly. I clearly stated that my calculation was for the cost in the worst month. Obviously there would be a returning part to that cost. But you would have to remove held amount from that calculation and it would be much more honest to go with average monthly income without surge payouts. Which my guess is would be closer to $20. So sure, $20 * 0.2% = 4 cents for every month after that. I’m sorry I didn’t mention those cents specifically. Lets also ignore the fact that you’ll have a new node up and running by then and those 3 cents are only really valid for 1 month.
Please notice that I’m also still going along with just having one node. Which you wouldn’t have in this scenario as you would be using the multiple HDD’s you would have used in RAID as separate nodes. And while that doesn’t change the cost calculation as you’re still dealing with the same averages, it does spread the risk. So that if a node fails, it will be one out of three in case of RAID5.

If you keep skipping over the chance of a bad event happening in your calculations, of course it sounds bad. But you’re blowing up the impact 50+ times (yearly basis) by skipping over that little detail. You also keep comparing a node on RAID vs a single node on a single HDD, which is not what anyone is advising you’d do.

Well, if I had 6 nodes on separate drives instead of 1 node on RAID6, those nodes would be pretty much empty, I would definitely not earn more money.
As for the failure chance - yes it’s rather low, but the impact is not just loss of payment for one month.
If my node was disqualified at the end of December, it would b in vetting in January, so no traffic and in February it would get at most 0.8TB (current ingress of my node) of data, instead of keeping the 4+TB it has now.

But this is during testing. We can hope that the traffic is going to increase over time as Tardigrade is used by more clients. This would mean that a node, that was started later would have more money held in escrow than a node that was started earlier.

There is also another angle - why do something knowing that it is likely to fail (over a couple of years, but still) and not meet the requirements? For me it’s like making an ice sculpture - if I spent that much time and effort in making something, I’d rather make it out of something more durable.

1 Like

You’d make the same amount until the point where you’d run out of space on the RAID6. After that you make more.

I gave a per month cost for this exact reason.

Or the exact opposite, since surge payouts are over. We don’t know where it will go. So lets just say we don’t know.

It’s likely to fail in 50 years. Not a couple of years. You mentioned 6 nodes. That UNlikely failure will most likely happen on only one node if it happens at all. You’d only be losing less than 1/6th of your potential income if you’d be losing any at all.

I get the comparisons… though ice sculpture is definitely creative ;). The thing is, this isn’t about losing data or a sculpture if you will. It’s about losing $$$. And that risk can be quantified and calculated and you can easily know if something makes sense or not. In your case, it’s probably not that likely that you’ll go over 2/3rd of your total HDD capacity. I would argue that that is not the case for most people. So yes, there is not a one size fits all best solution. In your case RAID might be the better option, but in most cases, it probably is not.

1 Like

I don’t think that a lot of hard drives that I can buy now would last 20 or 30 years, much less 50. I am pretty sure the helium would leak out if the drive is helium-filled. In use or just sitting on a shelf. Older drives seemed to be better, but then again, there are a lot of failed old drives.

Which may take years, increasing the chance that some drives would fail. Right now my node uses 4.7TB out of 16TB. And if I do run out of space, expanding the array is better, since if I created a new node, it would be in vetting, while I can expand a current node without loss of traffic.

Well, if the traffic stays at 4mbps like it is now, then running a node is not really worth it anyway. Running more than one node even more so, since payment is the same, but management overhead is higher, even if I ran all of the nodes in the same server, with separate dedicated drives.

Let’s just say I really dislike doing the same thing multiple times, especially if I can do it properly once.
I did create a lot of nodes in v2 though, I did not like that. “Oh, look, one node got hit by “timeoutrate” which never resets, but also drops traffic to almost zero, time to move the data to another server and create a new node.” But at least in v2, the new node would have full reputation in a week or so and there was no escrow. Also, having multiple active nodes meant more data and money, so there was a good thing about using multiple nodes.

I just do not feel right recommending people a method that has higher risk of failure. I recommend RAID (and other redundancies) and if someone does not want to use it, they can, but then expect higher chance of failure. I don’t know, it’s like recommending an ultra-cheap router. It may work great, but if it doesn’t, I’ll feel responsible for a bad recommendation. So I recommend Mikrotik (or using a PC/server as router). If it’s too expensive, well, use whatever, but I am not going to recommend the cheapest one.

I think that if you recommend no RAID, you should also recommend the person creates some backup nodes (with minimal allowed storage and bandwidth optionally with reduced speed or increased latency, so that those nodes do not get full), so that when the main node fails, they can replace it easily with a node that has already passed the vetting phase etc.

Going back to the requirements - if Storj really wanted people to run nodes on single drives etc, why make the requirements that strict? If the requirements are being updated now then I’ll wait until the new version to comment on that, right now I only have the current version. I do not want to think that Storj made the requirements like this and then recommends a setup that does not really meet them in order to make sure that the node operators never get the escrow money, so I’ll think that it’s just some internal contradiction (saw some of them in v2).

Right, to be fair the numbers I’m using are from backblaze and they take HDD’s out of commission after 5 years. So these numbers apply to the first 5 years and it’s fair to assume they chose to only use HDD’s for 5 years for a reason. Failure rates likely go up a bit after that.

Yea, backblaze statistics are nice, though the problem, as always, is that if I know that some model of drive had the lowest failure rate in the last 5 years, I cannot buy it anymore.
2% AFR for 5 years (or whatever the warranty period of the drive is) seems reasonable. Using RAID for the node it would be possible to use the drives for longer, until they actually start failing before replacing them instead of preemptively replacing them after the warranty expires. Though normally the drives in an array should be at least from different batches so they do not all fail at the same time. Still, it would be nice to have backups, but what can you do?

Seagate Warranty Lengths

Regular 2.5 inch == 2 Years
Regular 3.5 inch == 2 Years
Pro == 5 Years

I haven’t checked, but I imagine it’s similar across the industry. The warranty period for regular consumer HDDs is not 5 years… In Seagate’s case, it’s 2 years. And I imagine it’s about the same with other manufacturers.

Looking at more detailed specs for Seagate consumer level drives:

Load/Unload Cycles                        ...    600,000
Nonrecoverable Read Errors per bits read  ...    1 per 10E14
Power-on hours per year                   ...    2400
Workload Rate Limit (TB/year)             ...    55
Limited Warranty (years)                  ...    2

Jan 2020, my node experienced 5.26 TB Egress and 657.20 GB Ingress. So, that’s a minimum drive transfer of 5.9172 TB. Jan 2020 was a stress test. However, under full load for a full year, if I was running consumer hardware, my drive would exceed the yearly Workload Rate Limit.

5.9172 * 12 = 71.0064 TB / year.

Also the 2400 hours per year of power on condition is exceeded…

365 * 24 = 8760 hours

So… running a storage node on consumer hardware stresses that hardware well beyond the designed specifications.

The drives are powered-on 360% more than the spec. And may experience 129% greater transfer workload. Therefore, a claim of 2% failure rate and that drives are reasonably reliable with long warranty periods simply does not apply to the conditions that a storage node will experience.

Which is why you should use at least WD RED drives, they are designed for 24/7 use, though they really like to unload the heads (resulting in more wear and high latency), so the operator needs to configure the drives so that they do not do that. Seagate IronWolf is the equivalent.

i use WD perple for surviliance work very good.

WD Red Datasheet

Warranty period == 3 years
Workload == 180 TB/year

That works better, but WD-Red drives are as much as twice the price of other drives with a 2 year warranty. If one wishes to get a 5 year warranty, one will need to purchase WD Red Pro, which are about 2.5 times the price of WD-Red drives…

However, in general, any consumer HDD is going to be experiencing the extreme edge of the drive’s designed performance specification.

Backblaze lists a few models from the series you mentioned. I doubt that they stay below the rate limit and I’m almost certain they use the HDD’s 24/7.

The rated numbers by the manufacturer are more to segment the drives for different use cases than to actually display what they can handle. In my experience, most desktop drives will do fine in 24/7 use cases. Despite not officially being rated for that.

That said, I definitely do recommend NAS class drives for use in any form of RAID setup. Lately I’ve been using ironwolfs and I’m pretty happy with them so far. Ironwolfs tend to be a little cheaper and faster, at the trade off of some noise and heat and probably some more power use as well. I can live with that.

I use 2x Seagate Barracuda (I used them for v2 before) and 4x HGST Ultrastar drives in my RAID6 array. All drives are 7200RPM. If both Barracudas fail, I still have the newer Ultrastars.

OK…

But one can not make a 2% failure claim if one is typically running a consumer HDD at 360% of spec-ed power-on time and probably near the limit or over the limit of spec-ed data transfer.

1 Like

One can if it has been tested… Like backblaze did. Click the link please.

They did not test my drive.

It doesn’t matter how many drives are tested… statistics do not distribute to the individual. Statistics can only be used as an expected value across an entire population.

1 Like

If you’re just going to dismiss the best method we have of determining HDD reliability, the conversation is kind of over, don’t you think? How do you propose we determine reliability then?

Also please note that while they may not have tested your specific model. They test a wide range of mostly desktop and NAS class drives and the failure rates don’t differ all that much from each other. An average failure rate of 2% annually is more than fair to assume even if your HDD is not listed.

baracuda is worst HDD I ever seen, i have them lot of failure.

Let’s post factual information from the above thread…

  1. Unless a Storj operator is using WD-Red NAS drives, the power-on time for a consumer HDD is exceeded by about 3 times spec-ed.
  2. No consumer level HDD has a 5 year warranty.
  3. In order to get a 5 year warranty, a Pro/Enterprise level drive must be purchased.
  4. Running a Storj node requires that the hardware is being used at the extreme end of the performance curve… meaning the “worst case” (not my words) scenario is made more likely.
  5. If a node is DQ-ed that node loses its escrow as well as any potential future payout.
  6. Storj network traffic is unpredictable and is a memoryless system, meaning what happened last month has no effect on what happens next month and there is no predictive variable for determining either future or past network traffic.
  7. Running two or more nodes simultaneously on the same /24 subnet does not result in more traffic, but does result in each node being used less often and therefore filling up more slowly and accumulating lower future escrow payouts.

Because of the above, it is likely that more drives will fail than the expected statistical measure of 2% per year… and therefore more single drive nodes will be expected to be DQ-ed…

Therefore, I can not recommend that node operators run the recommended single drive per node configuration.

Not sure what you mean here, you’re twisting yourself into a knot. It’s simple as long as your HDD’s don’t fill up they make the exact same amount. You seem to suggest they make less, which simply isn’t true.

There is absolutely no reason to assume this at all. If you make a claim, you have to back it up with something. Backblaze, not unlike Storj is a cloud storage provider. Their HDD loads are likely very comparable to Storj. They definitely exceed all the same specs on the consumer desktop HDD’s in their tests. You don’t have to add risk that is already included in the tests. We can disagree about the best solution, but lets not disagree about the data. There is enough of that going on in politics.

Btw, I don’t disagree about any of the first 6 points you mentioned. It’s just that their impact is already accounted for in these numbers.

If a node operator is running one node, that node receives all the ingress, egress, and audit traffic. New nodes that do not have at least 100 audits receive less ingress. Therefore, it is entirely possible that running two nodes simultaneously will actual result in lower payout that running a single node. This is why the current recommendation is to run a single node until it is full and then spin up an additional node.

However, as I’ve detailed above, a node with a reasonably moderate storage space will fill very slowly. My own node has 3.5 TB reserved and has accumulated a mere 2.4 TB over a six month time frame. Therefore, per current recommendations, I would be expected to still be running a single node with a single drive…

If that were the case… and I spun up my node with typical consumer hardware… and ran it successfully for 6 months…

  1. Power-on time == 26*7*24 = 4368 hours
  2. Traffic =~ 20 TB

As you can see, I would already have used 21 months of power-on time from the 24 month power-on time from the warranty period. And the traffic is about on par with maximum spec-ed traffic. In other words, the apparent age of my consumer hardware would already be nearing the end-of-warranty period… at 6 months of use.