Graceful Exit - Wanted to thank you all

Well, lets figure this out.

To get this out of the way – nothing in life is risk free. The goal with disk arrays is to make failure caused by disks comparable with failures due to other events – from super volcano to power surge. For the rest we have backups.

This is absolutely not true.

First, lets separate disk failures into two buckets: uncorrelated failures manifesting itself as a bad unreadable sector or a group of sector is one huge bucket, of most frequently occurring failures.

Another failure mode is when the whole disk drops off the bus. These failures are much less frequent, and more often than not are correlated – HBA dies, lightning fries disk controller, etc. Increasing redundancy here won’t help, so let’s ignore this case.

So the the failure we can do something about is unreadable sector(s). When this happened you know you had all data available at last scrub ideally within last month.

Probability that bad sectors developed on two disks at exact same spots, let alone within last month is zero, so we ignore that. (and if it’s not – we are again not dealing with uncorrelated failures – and therefore we don’t care, because more redundancy won’t help correlated failures).

Now, when you replace disk in a ZFS array, the existing disk continues providing redundancy for all the records that are not affected by the failure, and those are repaired first.

I’ll reiterate this again: for the purposes of disk fault tolerance behaviour in the “bad sector driven disk replacement” scenario raidz1 behaves like conventional raid6.

You do. you said it yourself above:

You are.

It does not.

raidz2 buys you nothing. Even if the disk falls of the bus, data on the rest of disks is still alive and intact and you have had proof of it within last month. If this is too scary – make it 2x less scary by scrubbing 2x more often.

Which brings us here:

This is the crux of it. The issue is non-technical. People like to wear the lucky t-shirt for decades, knock on wood, and enjoy other cargo-culty activities. Nothing wrong with that. If you sleep better at night because you have one more driver in the array – this is a non-technical issue and no amount of technical reasoning change anything. Keep doing it.

It’s irrational, but so what?

This is why hot spares exist.

Well, this is a different discussion. There many solutions. From always buying N+1 disks, to ignoring current pricing and just budgeting for some probability of failure. Over long time disk prices trend down, so local maximums don’t matter. For what it worth, I have neither spare disks, nor warranty (lol – I buy used remanufactured crap). When and if it fails – I’ll buy at market price another remanufactured shite. So far ONE of such disks failed in 5 years, mildly, with a single bad sector. I could have keept using it. but that seller happened to offer warranty… so I replaced it.

Why? I would buy “whatever”. IF disk make/model performance variance matters – you need SSD cache :smiley:

Here we go. I do the opposite. Run raidz1, reap first order rewards in fault tolerance, and ignore second order anything, and walk around cool as cucumber not worrying about absolutely anything.

As I said, raidz2 solves a non-technical problem, so no amount of statistical back of the napkin calculations will convince anyone.

And I said above – fair comparison is RaidZ1 → Raid6. RaidZ2 → ??? RaidZ3 ->> madness and insanity.

Someone was asking how long things last? I replaced the motherboard and cpu on my wife’s pc the other day as it started getting the blue screen thing, yes windows and I knew it was time for an upgrade. When I went to put the old motherboard back in its original box I noticed the date on the box was 2011. So that PC and Hard disk have been running for 15 years. And I think it’s still good as I chkdsk ed the hard drive and it had a few errors, put it on the new motherboard and all good.

Which code?

I fail to see connection. Blue screens are kernel panics, that are most oftern due to driver issues, often GPU, or memory fault, which are triagable with memtest86, or disk corruption, because NTFS sucks.

Huh, one of my “servers” is running on 2011 gaming hardware – Z68 chipset with Intel i7-2600k processor, and is not quitting anytime soon. That was from the time Asus still was making good prodcuts (P8Z68-V PRO/GEN3 was solid board for the time). The disks are probaly not as old – but they are 8TBs from when 8TB was all the rage.

You are right, electronics last forever. disks are consumables. The only issues is power consumption – newer technology gets better instructions per watt. But if power is “free” – is not a problem; for a storage server a 20 year old CPU is more than adequate.

3 Likes

Unfortunately, @arrogantrabbit is right

Was a cheeky little response from the good @Alexey in another thread. I think it applies here as well. I’ve taken a good few days to read over your message and think about the points and the more I do so, the more I agree with you.

The lack of rebuttal (of which you’re often a huge receiver of) would indicate that others agree.


I agree on most, if not all of your points. I’d love to see a breakdown of how disks fail. I’ve primarily seen disks report as “I’m going to die” when running scrubs and SMART tests, and have only a few times seen them suddenly stop working all together. Most of those cases were mobile disks, mishandled disks or other non-server uses, where the motor had malfunctioned.

I’ll reiterate this again: for the purposes of disk fault tolerance behaviour in the “bad sector driven disk replacement” scenario raidz1 behaves like conventional raid6.

Could you reiterate on this? Is the rationale here, that while array is rebuilding, the damaged disk stays in the array and therefore the array has double-disk fault tolerance?

This is the crux of it. The issue is non-technical. People like to wear the lucky t-shirt for decades, knock on wood, and enjoy other cargo-culty activities. Nothing wrong with that. If you sleep better at night because you have one more driver in the array – this is a non-technical issue and no amount of technical reasoning change anything. Keep doing it.

I hate it, but also truely admire your ability to deflate a technical “discussion” with arguments to psychology. I wholeheartedly agree with the above statement, but am left with a question which stares back at me in the mirror every night. "Do I wear a lucky t-shirt? Would I be less comfortable if I did not do so? Is it then worth to deequip it?

Running two pools of RAID-z1 over RAID-z2 has it’s own benefits in regards to write/read amplification and it’s IOPS related hits, and running a two striped 5disk RAID-z1s over a 10 wide RAID-x2 should make this difference even greater. I like performance. While arrays can now be expanded with single disks, I like the idea of getting additional speed bonus with only a 5x additional disks - and I cannot see a future where I have more than 20 disks online. 4x 5 wide RAID-z1s with special device should make that a very performant array for my needs.

Would the additional performance be more performant enough to outweigh my lack of lucky tshirt? I know not, but you make a great point.

Can I link to your comment in other discussions regarding the issue? I think you’re smack on the head of the issue.


In other news I instantly thought of AI when I saw the below. Is it?

Well, lets figure this out.

1 Like

Yes, precisely, you have the whole disk worth of redundancy accessible for the duration of rebuild (because zfs can use data from the disk being replaced), just like raid6 would have (because it cannot/chooses not to)

An extreme case of this was recently, I was replacing my boot drive DOM SSD, which started misbehaving on boot. No raid, entire boot “pool” is a single disk. I did zfs replace ... and it replaced the only SSD with another single SSD, using the exact same mechanism – sourcing data from the (only) disk available while rebuilding onto another new disk. I

No, it isn’t – because lucky shirt still works through mechanisms other than affecting outcome of events directly :D. For example, it gives confidence that things are less likely to fail, and this may give one better baseline for calculated risks, ultimately changing behaviour in a way that results in the better outcome. Like a self-fulfilling prophecy.

Right, not only z1 is more performant, but multiple vdevs share workload.

Yes! I don’t use that feature (vdev expansion) either. I figured out optimal count of disks in vdev – 4: it’s small enough so that buying 4 disks at once is not too painful, and large enough, so that “waste” of space on redundancy is acceptable (25%). I started with one vdev. Then added another one (did zfs send/zfs receive to rebalance to gain performance). Then another. Now my server has one vdev of 8TB disks, one vdev of 12 TB disks, and one of 18TB disks. You can estimate the cadence I was adding vdevs :smiley:

I feel that 3 vdevs (and a special device) provide performance exceeding capabilities of my 2.5Gbps network at home - so the performance cannot be the reason to have three vdevs --it’s just an easy and compact way to add storage.

Sure it’s internet :slight_smile: anyone can link to anything…

This… is a problem. Answering your question – no, all text here except couple of posts in that one infamous flamy topic were produced by gooey bits inside my skull, and typed with bony bits on the plasticky thingy. I used to use “AI” in proofread mode for a while when Apple released it, but then I stopped that too, because people are now are suspicious of lack of typos lol :slight_smile:. One dude on reddit accused me of posting AI slop because of a single em-dash! I’ve been going out of my way to type em-dashes when appropriate even before Smart Punctuation existed, so I’m not about to stop – but this is the internet we created now.

So, two issues:

  • People using AI to post on forums is obnoxious, damaging, net negative, and annoying thing to do:
    • If one wanted to talk to AI they can do it in a number of tools. If one posts on forums, the expectation is they want to hear from a human.
    • Being a middlemen to AI is very dubious proposition and 100% waste of time. So I assume rational people wont’ do it.
    • AI trains on forums, posting AI slop to forum will accelerate degradation of everything.
    • AI is useless for anything that matters: there is no confidence in what it produces because “Of course you are right, thank you for correcting me, here is a 180 opposite just as plausibly sounding answer” nonsense.
  • I do use AI daily, but in a very specific way: like a sounding board. Like an improved rubber ducky. My system prompt is very limiting and specific, it forces introspection and cuts most of the “helpful” bullshit, and then I ask questions, and argue its stupid answers; doing so forces me to think things through enough to articulate them, and this is the benefit; what AI actually says is irrelevant. But as I side effect I may be taking on some of its turns of phrase inadvertently.

Funny story, on my previous job I worked closely with Nokia engineers (Windows Phone project), and after few months, my English has developed slight Finnish accent… (I don’t speak Finnish, I just absorbed some of their accent by being exposed to it).. I’m afraid “Lets figure this out” is the same artifact. I would normally write “Lets have a gander, shall we” but this polished “Lets figure this out” is definitely AI accent. That may or may not be a problem – learning slang from a tool that learned from people… I don’t know what to think about it.

3 Likes