Of course there was an issue, I said there that I lost 0.5% of data in the first post. When one has time once a week on average to fix stuff, you can see how it can easily turn into an ordeal. Corruption happened because drive was failing, it has since been fixed. ZFS couldn’t repair a lot of it because the ZFS checksum isn’t ECC, its just error detection checksum so nothing gets fixed. I hadn’t known that before deploying this system.
It can, if there is corruption, ZFS will throw I/O errors or something similar, it will absolutely refuse to read that data, actually it is impossible to get to this data, it is basically destroyed because they never made a way to access it, there is supposed to be a switch that lets you access this but it doesn’t work it seems. Furthermore, node is badly written, if it encounters an I/O error it exits after a while. So one bad file = catastrophic node failure, not very sensible.
Yeah, that is where it’s mostly likely to manifest, however same goes with other pools once you lost a drive or two.
Exactly, ZFS is completely useless there, I wasn’t aware of that before.
It doesn’t if it did that there would be no problem. What it does is reads the data, sees bad checksum and it refuses to serve the data, it does nothing with it. Not only does it do that, it also disallows you to change this data altogether, you need to first delete this, and better hope it’s not a directory, because it doesn’t always work from what I’ve seen.
ZFS is meant to be used with changing data, it is perfect for it, fragmentation is not an issue with ZFS, data is always fragmented because its a copy-on-write and it’s not an issue.
OP knows they could just delete the problem files to eliminate the OS-level errors… and the node would run fine (and maybe fail some audits, but it’s statistically unlikely) but they simply don’t want to
You are wrong, I never said I don’t want to, there is no way to. There is no facility to do so, ZFS isn’t designed for data corruption. I tried to delete the files from the 'zpool status -v" output, it deleted most of the data, not all however, and then on top of that it refused to delete some folders because they’re weren’t empty. And they weren’t empty because it didn’t allow me to delete all the insides. Again, ZFS isn’t designed well enough to handle corrption in all cases. I don’t insist on feeding bad data to the node, I’m trying to do whatever. Nothing works, not fixable. First the ZFS isn’t designed to handle corruption, then it isn’t designed to delete broken data. And then the node throws hands in the air and refuses to work because it encounters an error when accessing a file.
All bad designs. But generally you have hard time explaining these things to the developers because their answer will be (as is yours) “well, it’s not expected you will do that, of course it doesn’t work”. This is just unrobust design.
Good thing about ZFS is no bad blocks, ZFS is copy-on-write, each time you write something, it goes to some free location, never in the same place, even if you make small changes to a file.