Does this error cyclic redundancy check means that whole log file is dead?
what we can do with it? write-hashtbl as far as i know throw an error and stops, so just rebuild it not an option.
I’d suspect hardware failure then (cabling, HDD, maybe memory). If fsck (or its equivalent you’re using) does not find this problem, then this simply means the problem is intermittent.
Still, it’s a file system level report, not storage node. This means the storage node requested the file system to preserve a piece of data (here, a log file), and the file system now reports it cannot retrieve that data when the node is asking for it.
It still can be a memory or cabling problem: if system RAM had a bad bit at the write time, or the cable flipped one bit, then the bad bit was written to the drive, and now keeps being read in the same exact position.
Clearly it’s in storage node log. Generally a disk error or memory error would be located in relevant OS logs.
Accessing the same file and receive same outcome multiple time indicates an issue with file - which may or may not be caused by disk/memory/os issue - but still needs to be fixed by application in control of that file - in this case storagenode.
This is statement is not internally consistent. Application cannot nether fix nor be expected to workaround an issue with hardware/filesystem/OS/etc. What it can do – ignore such failures gracefully. Better yet – crash immediately, so that operator can debug the issue.
I you cannot trust written file – this is a critical abort time, attempting to carryon has a potential of corrupting even more data.
Assuming it is actually a hardware/filesystem/OS issue. Cryptic error messages that only mean something to developers won’t help a lot of operators. Hence original message.
If application attempts to retry operation and fails, we know it’s not application, and can then point finger elsewhere - A nice error message “Possible Faulty Hardware detected on /path/to/faulty/file”. Please check.