It doesn’t take much lost data to DQ… a few incorrectly stored bits here and there will corrupt enough files over time to DQ… The host OS may continue merrily along just fine with a reboot here or there… maybe a hiccup once or twice, while the node data rots away.
I don’t know…
Here’s a forum post of a SNO reporting strange node behavior while employing ZFS…
It seems ZFS was allowed to gobble up 85% of the system RAM. And once limited to 20% the problems subsided. One poster indicated that the reported error showed a heap corruption… and attributed that to Go.
What really happened?
Do we really know?
Is it possible that when ZFS took over 85% of RAM, that a portion of that RAM image was corrupted due to the use of non-ECC RAM ? It’s unlikely that the user was using ECC RAM and an Intel Atom processor.