Fatal Error on my Node

The problem related to disk slowness. We did not come to the common conclusion yet, but usually disks are slow on writes, if they:

  • SMR
  • used for something other (for example, more than one node on the same disk or using system disk for the node, etc.)
  • network connected drive/network filesystem (SMB/CIFS, NFS, etc.)
  • external disks

The current suggestion is to check disk for errors and fix them. If there is no issues with the disk itself, but it’s just slow, you may increase a write timeout a little bit (the current is 1m0s, you may try to increase with 30s step):

There is no bug found in a new version, instead we fixed an old bug, when the readability and writeability checkers did not have a timeout at all and hanging forever in the background if the disk is too slow or too busy or your setup has hardware issues. These checkers were designed to protect your node from disqualification if your disk is not writeable or not readable, but they did not work in case of partial hanging. Now we added a timeout and this disk unavailability become visible.

2 Likes