Using ZFS Arc cache on non-ECC RAM

anon27637763 · March 3, 2020, 11:40pm

No. It doesn’t say that at all.

It says, explicitly and succinctly, that non-ECC ZFS systems may completely destroy the data in the array, while ECC ZFS systems will probably not do so.

Pretty much non-ECC ZFS is susceptible to self-destruction upon memory failure.

And, no, other FS-es are not susceptible to the same self-destruction.

BrightSilence · March 4, 2020, 1:38am

Yeah, well, there are 2 elements in that post I can quickly respond to. First it mentions that if on write the data is corrupted in memory, the checksums will not prevent that corrupted data from being written to the disk. This statement simply says that the additional protections in ZFS don’t prevent this specific issue. This issue would be there on any file system. So it says in this respect ZFS is not better than other file systems. It definitely doesn’t say it is worse. Because it isn’t.

Then it moves on to say it can corrupt the entire array. Please note that it gives no explanation of how that would happen. I believe they are referring to the data scrubbing “issue” that @Derkades mentioned. The theory there being that data scrubbing using bad memory could corrupt all the data or significant amounts of the data. This was debunked in the article I posted earlier.
Will ZFS and non-ECC RAM kill your data? – JRS Systems: the blog

The problem is, the scenario as written doesn’t actually make sense. For one thing, even if you have a particular address in RAM with a stuck bit, you aren’t going to have your entire filesystem run through that address. That’s not how memory management works, and if it were how memory management works, you wouldn’t even have managed to boot the system: it would have crashed and burned horribly when it failed to load the operating system in the first place. So no, you might corrupt a block here and there, but you’re not going to wring the entire filesystem through a shredder block by precious block.

But we’re being cheap here. Say you only corrupt one block in 5,000 this way. That would still be hellacious. So let’s examine the more reasonable idea of corrupting some data due to bad RAM during a scrub. And let’s assume that we have RAM that not only isn’t working 100% properly, but is actively goddamn evil and trying its naive but enthusiastic best to specifically kill your data during a scrub:

First, you read a block. This block is good. It is perfectly good data written to a perfectly good disk with a perfectly matching checksum. But that block is read into evil RAM, and the evil RAM flips some bits. Perhaps those bits are in the data itself, or perhaps those bits are in the checksum. Either way, your perfectly good block now does not appear to match its checksum, and since we’re scrubbing , ZFS will attempt to actually repair the “bad” block on disk. Uh-oh! What now?

Next, you read a copy of the same block – this copy might be a redundant copy, or it might be reconstructed from parity, depending on your topology. The redundant copy is easy to visualize – you literally stored another copy of the block on another disk. Now, if your evil RAM leaves this block alone, ZFS will see that the second copy matches its checksum, and so it will overwrite the first block with the same data it had originally – no data was lost here, just a few wasted disk cycles. OK. But what if your evil RAM flips a bit in the second copy? Since it doesn’t match the checksum either, ZFS doesn’t overwrite anything. It logs an unrecoverable data error for that block, and leaves both copies untouched on disk. No data has been corrupted. A later scrub will attempt to read all copies of that block and validate them just as though the error had never happened, and if this time either copy passes, the error will be cleared and the block will be marked valid again (with any copies that don’t pass validation being overwritten from the one that did ).

Apologies for the rather long quote…

anon27637763 · March 4, 2020, 3:15am

Is this the default ZFS configuration?

Is there a requirement here for a specific drive configuration and ZFS options to be set for this redundant copy of the same data block?

If so, then the argument is won on both sides:

ZFS requires specific settings to avoid the known problem with non-ECC memory.
Since we are talking about non-OS drives… specifically dedicated Storj node drives, the OS isn’t going to be affected by a self-destructing ZFS array.
Lost data pieces will definitely cause node reputation to decrease, and 1 out of 5000 corrupted blocks would probably DQ… I think that’s about 128 KB out of every 625 MB… Many of my blobs are 1.3 KB…
…Late night thought error…

1 block out of 5000:

In the last 15 days my node stored 198,379 files about 70% of which are less than 1 ZFS block in size. So, 1 in 5000 would be 39 data pieces destroyed of the last 15 days of operation.

If ZFS requires extra drives to store extra copies of data so that non-ECC memory doesn’t corrupt data, then why use it over other FSes which also require extra copies of data so that spurious drive errors or failure don’t corrupt data?

I’m calling it a wash… as long as the specific necessary ZFS configuration is detailed.

And amend my original statement to:

If using ZFS without ECC memory, make sure you know what you are doing.

BrightSilence · March 4, 2020, 7:31am

This is about a possible issue with data scrubbing which is only done on arrays with parity or clones. So yes, this is not just the default it’s by definition the case if you run data scrubbing.

Again, no more than any other file system

You’re talking about memory corruption causing this. Memory corruption impacts the entire system, so also the OS.

This was a completely random example and is much worse than it would likely ever be. Your OS would definitely not survive this amount of memory corruption. And none of that even matters since the explanation that follows explains that even this scenario in most cases DOESN’T cause actual corruption of the data. It either restores the data based on a copy or parity, in which case the good data is overwritten with the same good data. Or it fails the checksum in which case no data is overwritten at all.

Because despite what you keep ignoring the checksumming systems in ZFS add a LOT of extra protection and self healing capability that other file systems simply don’t have. I’m not going to go into all of the advantages of ZFS as you can just Google it. But needless to say they exist.

Based on what calculation?

The specificity of mentioning ZFS here still makes no sense whatsoever. I’ll remind you of the ZFS developers quote.

There’s nothing special about ZFS that requires /encourages the use of ECC RAM more than any other file system.

Derkades · March 4, 2020, 7:51am

This is a great example of how two people can draw a different conclusion from the same text.

kevink · March 4, 2020, 8:03am

Nah it is only one person drawing a different conclusion than even the same study does. Bright and I seem to understand it the same way.

anon27637763 · March 4, 2020, 12:31pm

Data scrubbing is recommended here by some guy at Oracle to be performed once a month or so.

Now that I’m looking at more of the picture, I see that my basic premise was incorrect. However, when using ZFS with non-ECC memory one loses the ability to fully rely on the data scrubbing…

Might be a ZFS convert here. I think I could like 2010… again.

Derkades · March 8, 2020, 7:27pm

For the ZFS scrubbing without ECC issue, you may want to watch this video: https://www.youtube.com/watch?v=52x4PSxbjUg