Error opening database on storagenode

node1 · June 10, 2024, 4:53pm

Another day, another problem.
Node regularly stops. Once server restarted, or sometimes even docker container, it runs with no problems for few hours, and later stops.

e2fsck found some problems, but all fixed a few days ago. Then few DB where corrputed. It was error “file is not database” or etc. I’ve moved all db’s away, started node, stopped node, moved back all good db’s, node was running fine for 4-8hours. and later stopped again.
Now i’ve checked one more time with e2fsck it found 3 problems, but fixed. It founds few problems every time, when node stops.
smartmoon tools, neither short, neither long test can’t see any problems with the drive as well. SMART passed - OK.

I have to tell that this server suffers a few power outages this month. Since then these problems started.

Another node on the same server running fine.

How do i solve this problem?

T.Y.

e2fsck -f /dev/sda1
e2fsck 1.46.5 (30-Dec-2021)
Pass 1: Checking inodes, blocks, and sizes
Inode 55268176 extent tree (at level 1) could be shorter. Optimize? yes to all
Inode 55290561 extent tree (at level 1) could be shorter. Optimize? yes

Inode 56810310 extent tree (at level 1) could be shorter. Optimize? yes

Inode 81803911 extent tree (at level 1) could be shorter. Optimize? yes

Pass 1E: Optimizing extent trees
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/sda1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sda1: 21701866/183144448 files (0.2% non-contiguous), 971257731/1465129488 blocks

Mitsos · June 10, 2024, 5:40pm

Your system isn’t running stable, how do you expect it not to have any problems? You need to focus on making it run stable and the problems will go away. You can’t have filesystem corruption every “4-8” hours.

Alexey · June 11, 2024, 3:48am

This is mean that you need to run fsck one more time and do so, until it will end in a clean state (no modifications after the check).

node1 · June 11, 2024, 5:38am

Thank you @Alexey i had a situation, where e2fsck does not find any problems, but still node went down after day or two

At the moment everything looks nice. Will see how it will be going on:

user@user:~ $ sudo e2fsck -f /dev/sda1
e2fsck 1.46.5 (30-Dec-2021)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sda1: 22668921/183144448 files (0.5% non-contiguous), 1007989266/1465129488 blocks
user@user:~ $

Alexey · June 11, 2024, 5:41am

This could be a different reason like a writability or readability checks failed due to timeout (it will be a FATAL error).