Using ZFS Arc cache on non-ECC RAM

kevink · March 3, 2020, 8:56pm

Your example isn’t accurate. I was talking about a write operation.

But sure, that example was in your study. But it also mentioned that this mainly occurs if the last-access-timestamp on the filesystem was activated, because only then it needs to write data back, which is not needed.
However I do not want to reference everything about zfs on a study that was probably made before 2010! First make sure it still works that way (although chances are high that it does).

Also you are not getting dq’ed for a single lost piece. That means your PC must experience a lot of bit-flips in your RAM before you get dq’ed.

Also you haven’t considered that a bad sector on your HDD will result in data loss for ext4, which would (according to your logic) also immediately result in a dq’ed node… ZFS protects against that.

So for actually claiming if one is worse than the other without ECC, you need to present actual facts and numbers. Just from theory they are all bad and have weaknesses.

BrightSilence · March 3, 2020, 9:00pm

And even then, it’ll start crashing and rebooting long before it destroys enough data to get disqualified.

anon27637763 · March 3, 2020, 9:01pm

I presented a study which looked at the ZFS architecture while using non-ECC RAM.

I have presented factual information and a theoretical example.

The DQ in my step 7 is not an immediate DQ because of one error in one audit. The example shows how a single error in non-ECC RAM and ZFS will propagate that error to disk. Once the error has been written to disk, that error stays an error… and thus any subsequent reads for that audit piece will fail.

Thus ZFS is more likely to result in an avoidable DQ for a SNO if one is using non-ECC RAM… versus the same hardware and EXT4.

kevink · March 3, 2020, 9:13pm

You presented a very old study. And you only pulled from the study what you needed to present facts about/against zfs. You ignored the part about zfs being better against HDD data corruption or about ext2 having problems with memory corruption as well.

This is also true for ext4. Writes to disks are typically queued (unless a synchronous write is requested which typically only databases do) and during that time can get corrupted.
Or the HDD experiences a corrupt sector, then that data is gone too.

So for actually claiming, that with the same hardware, one filesystem will more likely result in DQ than the other, you need actual numbers for all those events occuring for the different filesystems. Otherwise there is no proof to your conclusion. Only a theoretical construct that shows that data can be corrupted, which you can create for any filesystem.

So unless you provide actual numbers for each of those events happening and compare those to the numbers of the different filesystems, I’m done arguing in a circle.

anon27637763 · March 3, 2020, 9:24pm

It doesn’t take much lost data to DQ… a few incorrectly stored bits here and there will corrupt enough files over time to DQ… The host OS may continue merrily along just fine with a reboot here or there… maybe a hiccup once or twice, while the node data rots away.

I don’t know…

Here’s a forum post of a SNO reporting strange node behavior while employing ZFS…

It seems ZFS was allowed to gobble up 85% of the system RAM. And once limited to 20% the problems subsided. One poster indicated that the reported error showed a heap corruption… and attributed that to Go.

What really happened?

Do we really know?

Is it possible that when ZFS took over 85% of RAM, that a portion of that RAM image was corrupted due to the use of non-ECC RAM ? It’s unlikely that the user was using ECC RAM and an Intel Atom processor.

BrightSilence · March 3, 2020, 9:25pm

I don’t think that was the claim. The claim was that with ZFS it is more important to use ECC RAM. I think at least at the time of the study there was something to be said for that. But in the grand scheme of it, that’s a small detail.

You are both right. ZFS adds a lot of protections against corruptions compared to ext4 or ntfs. But there is a higher reliability on RAM, so ECC would help more in ZFS setups compared to ext4 and ntfs. That said, the jump from ext4/ntfs to ZFS adds a LOT more reliability than the jump from ZFS non-ECC to ZFS ECC.

BrightSilence · March 3, 2020, 9:28pm

And here’s someone from the STORJ Dev team excluding ZFS as the culprit in that same topic…

kevink · March 3, 2020, 9:30pm

That was exactly his claim and that’s what is bothering me. I’m not denying there is a chance with zfs and that it might be slightly bigger than with ext4 but there is not nearly enough data in this thread to support a claim like this.

anon27637763 · March 3, 2020, 9:32pm

The comment includes the phrase “I don’t think” … this not excluding ZFS as the culprit… it is a guess. It may be correct, it might not be. No one will ever know, kinda like a tootsie pop.

However, the study of ZFS and non-ECC RAM suggests that ZFS may be the culprit.

In either case, ZFS only adds higher integrity when ECC RAM is used … if non-ECC RAM is used ZFS may decrease data integrity

BrightSilence · March 3, 2020, 9:33pm

@anon27637763 I appreciate your efforts on trying to learn more information. But it seems you have the tendency to dig in on rather minor details and then confirmation bias takes over and you go looking for things that confirm your suspicions. This is often not how you find good information. Try to keep a neutral point of view when searching for info and realize that if you Google things like “ZFS problems with non-ECC memory” you will only find results that confirm that.

anon27637763 · March 3, 2020, 9:33pm

This is true.

However, sometimes the details are not minor.

BrightSilence · March 3, 2020, 9:35pm

I stand corrected. This claim was definitely wrong and the linked research never said that either.

Pentium100 · March 3, 2020, 9:40pm

I doubt that, however, I see one way the error could have happened.
ZFS takes pretty much all available RAM, but releases it when needed. What I think happens is that the release is not as fast as it is with regular cache, so, if a process wants to allocate a lot of memory at once, the allocation may fail. This actually happens for me when starting a VM that has more RAM allocated than there is “free” (not use by zfs or other processes) RAM. I just get an error and the virsh start command fails, then I have to reduce zfs_arc_max to free up some RAM and start the VM again. I have not seen it happen with regular software.
So, something like this could have happened - the server in that link has only 4GB of RAM and when the storage node tried to allocate more memory for the garbage collection, the allocation failed and the node crashed. All of my servers that use zfs have more RAM, so I only notice the problem when starting VMs and not with other software.

As for RAM corruption - it happens, I have seen ECC errors on a couple of my servers, strange thing is that they come and go - it’s an error once a few days for a while and then they go away for months with no reboot or anything.

anon27637763 · March 3, 2020, 9:43pm

Yes… it does say what I’m writing…

Quoted from the Summary and Discsussion:

Our results for memory corruptions indicate cases where bad data is returned to the user, operations silently fail, and the whole system crashes. Our probability analysis shows that one single bit flip has small but non-negligible chances to cause failures such as reading/writing corrupt data and system crashing.

We argue that file systems should be designed with end-to-end data integrity as a goal. File systems should not only provide protection against disk corruptions, but also aim to protect data from memory corruptions. Although dealing with memory corruptions is hard, we conclude by discussing some techniques that file systems can use to increase protection against memory corruptions.

Block-level checksums in the page cache:

File systems could protect the vulnerable data and metadata blocks in the page cache by using checksums. For example, ZFS could use the checksums inside block pointers in the page cache, update them on block updates, and verify the checksums on reads. However, this does incur an overhead in computation as well as some complexity in implementation; these are always the tradeoffs one has to make for reliability.

Metadata checksums in the heap:

Even with blocklevel checksums in the page cache, there are still copies of metadata structures in the heap that are vulnerable to memory corruptions. To provide end-to-end data integrity, data-structure checksums may be useful in protecting in-heap metadata structures.

BrightSilence · March 3, 2020, 9:55pm

None of that says that ZFS with non-ECC is more likely to have data corruption than ext4 with non-ECC. And that’s frankly ridiculous considering the many added features of ZFS to prevent data corruption. That may not have been what you meant to say, but it is what you said.

This research simply points out one specific failure that may be more common, but in the grand scheme of things that’s negligible compared to the improvements ZFS provides over ext4.

I’ll add a link to another post on this topic.
https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/
And a quote straight from a ZFS developer at the bottom of that topic.

I don’t care about your logic! I wish to appeal to authority!

OK. “Authority” in this case doesn’t get much better than Matthew Ahrens, one of the cofounders of ZFS at Sun Microsystems and current ZFS developer at Delphix. In the comments to one of my filesystem articles on Ars Technica, Matthew said “There’s nothing special about ZFS that requires/encourages the use of ECC RAM more so than any other filesystem.”

May I also suggest a more neutral Google search:
https://www.google.com/search?q=is+ecc+ram+required+for+zfs&oq=is+ecc+ram+required+for+zfs

anon27637763 · March 3, 2020, 9:58pm

No…

I think I wrote may not is

We are discussing statistical probabilities. My argument is an extension of the ZFS study.

If ZFS using non-ECC memory copies memory errors after a read to disk, then ZFS may produce worse reliability than EXT4 which does not copy memory errors after a read to disk.

BrightSilence · March 3, 2020, 10:07pm

You didn’t. But that’s ok. I don’t want to be arguing about specific words.

As for that comparison. Disk writes on ext4 still go through memory. If that memory has a hard fault (which your study mentioned is by far the most common) it doesn’t matter how long that data was in memory, this error would be written to the HDD on ext4 no matter what. The only additional risk on ZFS is soft errors during the time read cache is in memory after which data is written to that block. While this is an added risk, it’s the less common kind of error and in total only adds a very small additional risk that in no way adds up to all the additional protections ZFS’s checksumming and other additional data protections features adds compared to ext4.

It doesn’t take long to find out the difference in reliability between zfs and ext4. If you were right, every result you find would mention the requirement for ECC. And it’s barely ever mentioned.

Derkades · March 3, 2020, 11:13pm

Well, there it is again, the common myth that ZFS “requires” ECC memory. Usually it’s about scrubbing though, not ARC.

I’ll go along with your reasoning for a bit, I believe you are saying that memory errors will occur more often when using ZFS because ZFS allegedly uses RAM more often then conventional filesystems.

According to a report by Google, some of their servers experienced a couple hundred memory errors per year. That is extremely insignificant compared to how many successful reads must have taken place in that time.

Oh no, your audit percentage drops from 99,9999999% to just 99,99%!

Also, do you know what disk error rates are? https://www.zdnet.com/article/why-raid-5-stops-working-in-2009/

The real problem with ZFS is the metadata overhead, wasting space that you could make money with. Unless you share some space on an existing pool or want to create a new redundant pool for Storj, there’s no reason to use ZFS for a single disk.

anon27637763 · March 3, 2020, 11:26pm

I’ll conclude with this quote:

A Complete Guide to FREENAS Hardware Design, Part I, Purpose and Best Practices

This is probably the most contested issue surrounding ZFS (the filesystem that FreeNAS uses to store your data) today. I’ve run ZFS with ECC RAM and I’ve run it without. I’ve been involved in the FreeNAS community for many years and have seen people argue that ECC is required and others argue that it is a pointless waste of money. ZFS does something no other filesystem you’ll have available to you does: it checksums your data, and it checksums the metadata used by ZFS, and it checksums the checksums. If your data is corrupted in memory before it is written, ZFS will happily write (and checksum) the corrupted data. Additionally, ZFS has no pre-mount consistency checker or tool that can repair filesystem damage. This is very nice when dealing with large storage arrays as a 64TB pool can be mounted in seconds, even after a bad shutdown. However if a non-ECC memory module goes haywire, it can cause irreparable damage to your ZFS pool that can cause complete loss of the storage. For this reason, I highly recommend the use of ECC RAM with “mission-critical” ZFS. Systems with ECC RAM will correct single bit errors on the fly, and will halt the system before they can do any damage to the array if multiple bit errors are detected. If it’s imperative that your ZFS based system must always be available, ECC RAM is a requirement. If it’s only some level of annoying (slightly, moderately…) that you need to restore your ZFS system from backups, non-ECC RAM will fit the bill.

Derkades · March 3, 2020, 11:37pm

It just says that, in addition to using ZFS, using ECC memory is one additional good step to take for data integrity. There is nothing about ZFS that requires ECC memory more than other filesystems.

I personally use ECC memory for my home server, it was worth the price. I would never consider buying it for a Storj server though.