Hello friends,
Is there someone using ext4 with fast commit enabled?
I was running Kernel 5.15.102 and since I have enabled fast commit I’ve experienced storagenode software hangs on multiple nodes, something along the lines as displayed in the Kernel log below.
I saw there were some ext4 fixes in newer 5.15 Kernel versions, so have decided to update to 5.15.142, but yesterday and today the issue appeared again on two updated, but different nodes running on different drives.
The underlying drives and hypervisor are all okay and I had 0 issues with ext4 before enabling fast commit, which apparently should be stable at this point.
If someone is running ext4 with this option enabled, which Kernel versions are you running?
Thank you.
[Fri Jan 12 15:15:45 2024] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: discard. Quota mode: none.
[Fri Jan 12 15:15:45 2024] ext4 filesystem being mounted at /boot supports timestamps until 2038 (0x7fffffff)
[Fri Jan 12 15:47:17 2024] hrtimer: interrupt took 2569412 ns
[Fri Jan 12 15:55:57 2024] kworker/dying (2916) used greatest stack depth: 12296 bytes left
[Sat Jan 13 00:48:26 2024] kworker/dying (7525) used greatest stack depth: 12200 bytes left
[Sat Jan 13 04:12:41 2024] kworker/dying (10041) used greatest stack depth: 12008 bytes left
[Sat Jan 13 14:59:24 2024] BUG: kernel NULL pointer dereference, address: 0000000000000080
[Sat Jan 13 14:59:24 2024] #PF: supervisor read access in kernel mode
[Sat Jan 13 14:59:24 2024] #PF: error_code(0x0000) - not-present page
[Sat Jan 13 14:59:24 2024] PGD 800000011e399067 P4D 800000011e399067 PUD 11e39a067 PMD 0
[Sat Jan 13 14:59:24 2024] Oops: 0000 [#1] SMP PTI
[Sat Jan 13 14:59:24 2024] CPU: 0 PID: 2389 Comm: storagenode Not tainted 5.15.142-gentoo #1
[Sat Jan 13 14:59:24 2024] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[Sat Jan 13 14:59:24 2024] RIP: 0010:jbd2_submit_inode_data+0x70/0xd0
[Sat Jan 13 14:59:24 2024] Code: 01 48 8b 77 20 85 c0 7f 4c 48 c7 44 24 08 00 00 00 00 48 8b 7e 30 48 89 e6 48 c7 44 24 20 00 00 00 00 c7 44 24 20 01 00 00 00 <48> 8b 87 80 00 00 00 48 01 c0 48 89 04 24 48 8b 43 30 48 89 44 24
[Sat Jan 13 14:59:24 2024] RSP: 0018:ffffc90000897dc0 EFLAGS: 00010246
[Sat Jan 13 14:59:24 2024] RAX: 0000000000000000 RBX: ffff8880045418c0 RCX: 0000000000000000
[Sat Jan 13 14:59:24 2024] RDX: 0000000000000001 RSI: ffffc90000897dc0 RDI: 0000000000000000
[Sat Jan 13 14:59:24 2024] RBP: ffffc90000897ed8 R08: 0000000000000006 R09: 0000000000000000
[Sat Jan 13 14:59:24 2024] R10: 0000000000000238 R11: ffffffffffffffff R12: ffffc90000897e98
[Sat Jan 13 14:59:24 2024] R13: ffff888013a9f958 R14: ffff888101725e80 R15: ffffc90000897e80
[Sat Jan 13 14:59:24 2024] FS: 00007efc8d685b38(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
[Sat Jan 13 14:59:24 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Jan 13 14:59:24 2024] CR2: 0000000000000080 CR3: 000000011a658004 CR4: 0000000000170ef0
[Sat Jan 13 14:59:24 2024] Call Trace:
[Sat Jan 13 14:59:24 2024] <TASK>
[Sat Jan 13 14:59:24 2024] ? __die_body.cold+0x1a/0x1f
[Sat Jan 13 14:59:24 2024] ? page_fault_oops+0xa9/0x250
[Sat Jan 13 14:59:24 2024] ? search_exception_tables+0x33/0x50
[Sat Jan 13 14:59:24 2024] ? search_module_extables+0x5/0x40
[Sat Jan 13 14:59:24 2024] ? exc_page_fault+0x71/0x140
[Sat Jan 13 14:59:24 2024] ? asm_exc_page_fault+0x22/0x30
[Sat Jan 13 14:59:24 2024] ? jbd2_submit_inode_data+0x70/0xd0
[Sat Jan 13 14:59:24 2024] ext4_fc_commit+0x29a/0x8d0
[Sat Jan 13 14:59:24 2024] ? file_check_and_advance_wb_err+0x27/0xb0
[Sat Jan 13 14:59:24 2024] ext4_sync_file+0xd8/0x340
[Sat Jan 13 14:59:24 2024] __x64_sys_fsync+0x32/0x60
[Sat Jan 13 14:59:24 2024] do_syscall_64+0x42/0x90
[Sat Jan 13 14:59:24 2024] entry_SYSCALL_64_after_hwframe+0x62/0xcc
[Sat Jan 13 14:59:24 2024] RIP: 0033:0x4074ce
[Sat Jan 13 14:59:24 2024] Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48
[Sat Jan 13 14:59:24 2024] RSP: 002b:000000c00137a968 EFLAGS: 00000202 ORIG_RAX: 000000000000004a
[Sat Jan 13 14:59:24 2024] RAX: ffffffffffffffda RBX: 0000000000000077 RCX: 00000000004074ce
[Sat Jan 13 14:59:24 2024] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000077
[Sat Jan 13 14:59:24 2024] RBP: 000000c00137a9a8 R08: 0000000000000000 R09: 0000000000000000
[Sat Jan 13 14:59:24 2024] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[Sat Jan 13 14:59:24 2024] R13: 22aeffeeaeaeaaa0 R14: 000000c008bf5380 R15: 0000000000000090
[Sat Jan 13 14:59:24 2024] </TASK>
[Sat Jan 13 14:59:24 2024] Modules linked in:
[Sat Jan 13 14:59:24 2024] CR2: 0000000000000080
[Sat Jan 13 14:59:24 2024] ---[ end trace fb6b44467200858f ]---
[Sat Jan 13 14:59:24 2024] RIP: 0010:jbd2_submit_inode_data+0x70/0xd0
[Sat Jan 13 14:59:24 2024] Code: 01 48 8b 77 20 85 c0 7f 4c 48 c7 44 24 08 00 00 00 00 48 8b 7e 30 48 89 e6 48 c7 44 24 20 00 00 00 00 c7 44 24 20 01 00 00 00 <48> 8b 87 80 00 00 00 48 01 c0 48 89 04 24 48 8b 43 30 48 89 44 24
[Sat Jan 13 14:59:24 2024] RSP: 0018:ffffc90000897dc0 EFLAGS: 00010246
[Sat Jan 13 14:59:24 2024] RAX: 0000000000000000 RBX: ffff8880045418c0 RCX: 0000000000000000
[Sat Jan 13 14:59:24 2024] RDX: 0000000000000001 RSI: ffffc90000897dc0 RDI: 0000000000000000
[Sat Jan 13 14:59:24 2024] RBP: ffffc90000897ed8 R08: 0000000000000006 R09: 0000000000000000
[Sat Jan 13 14:59:24 2024] R10: 0000000000000238 R11: ffffffffffffffff R12: ffffc90000897e98
[Sat Jan 13 14:59:24 2024] R13: ffff888013a9f958 R14: ffff888101725e80 R15: ffffc90000897e80
[Sat Jan 13 14:59:24 2024] FS: 00007efc8d685b38(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
[Sat Jan 13 14:59:24 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Jan 13 14:59:24 2024] CR2: 0000000000000080 CR3: 000000011a658004 CR4: 0000000000170ef0