Fast commit ext4 option

zip · January 13, 2024, 9:13pm

Hello friends,

Is there someone using ext4 with fast commit enabled?
I was running Kernel 5.15.102 and since I have enabled fast commit I’ve experienced storagenode software hangs on multiple nodes, something along the lines as displayed in the Kernel log below.
I saw there were some ext4 fixes in newer 5.15 Kernel versions, so have decided to update to 5.15.142, but yesterday and today the issue appeared again on two updated, but different nodes running on different drives.
The underlying drives and hypervisor are all okay and I had 0 issues with ext4 before enabling fast commit, which apparently should be stable at this point.
If someone is running ext4 with this option enabled, which Kernel versions are you running?

Thank you.

[Fri Jan 12 15:15:45 2024] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: discard. Quota mode: none.
[Fri Jan 12 15:15:45 2024] ext4 filesystem being mounted at /boot supports timestamps until 2038 (0x7fffffff)
[Fri Jan 12 15:47:17 2024] hrtimer: interrupt took 2569412 ns
[Fri Jan 12 15:55:57 2024] kworker/dying (2916) used greatest stack depth: 12296 bytes left
[Sat Jan 13 00:48:26 2024] kworker/dying (7525) used greatest stack depth: 12200 bytes left
[Sat Jan 13 04:12:41 2024] kworker/dying (10041) used greatest stack depth: 12008 bytes left
[Sat Jan 13 14:59:24 2024] BUG: kernel NULL pointer dereference, address: 0000000000000080
[Sat Jan 13 14:59:24 2024] #PF: supervisor read access in kernel mode
[Sat Jan 13 14:59:24 2024] #PF: error_code(0x0000) - not-present page
[Sat Jan 13 14:59:24 2024] PGD 800000011e399067 P4D 800000011e399067 PUD 11e39a067 PMD 0 
[Sat Jan 13 14:59:24 2024] Oops: 0000 [#1] SMP PTI
[Sat Jan 13 14:59:24 2024] CPU: 0 PID: 2389 Comm: storagenode Not tainted 5.15.142-gentoo #1
[Sat Jan 13 14:59:24 2024] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[Sat Jan 13 14:59:24 2024] RIP: 0010:jbd2_submit_inode_data+0x70/0xd0
[Sat Jan 13 14:59:24 2024] Code: 01 48 8b 77 20 85 c0 7f 4c 48 c7 44 24 08 00 00 00 00 48 8b 7e 30 48 89 e6 48 c7 44 24 20 00 00 00 00 c7 44 24 20 01 00 00 00 <48> 8b 87 80 00 00 00 48 01 c0 48 89 04 24 48 8b 43 30 48 89 44 24
[Sat Jan 13 14:59:24 2024] RSP: 0018:ffffc90000897dc0 EFLAGS: 00010246
[Sat Jan 13 14:59:24 2024] RAX: 0000000000000000 RBX: ffff8880045418c0 RCX: 0000000000000000
[Sat Jan 13 14:59:24 2024] RDX: 0000000000000001 RSI: ffffc90000897dc0 RDI: 0000000000000000
[Sat Jan 13 14:59:24 2024] RBP: ffffc90000897ed8 R08: 0000000000000006 R09: 0000000000000000
[Sat Jan 13 14:59:24 2024] R10: 0000000000000238 R11: ffffffffffffffff R12: ffffc90000897e98
[Sat Jan 13 14:59:24 2024] R13: ffff888013a9f958 R14: ffff888101725e80 R15: ffffc90000897e80
[Sat Jan 13 14:59:24 2024] FS:  00007efc8d685b38(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
[Sat Jan 13 14:59:24 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Jan 13 14:59:24 2024] CR2: 0000000000000080 CR3: 000000011a658004 CR4: 0000000000170ef0
[Sat Jan 13 14:59:24 2024] Call Trace:
[Sat Jan 13 14:59:24 2024]  <TASK>
[Sat Jan 13 14:59:24 2024]  ? __die_body.cold+0x1a/0x1f
[Sat Jan 13 14:59:24 2024]  ? page_fault_oops+0xa9/0x250
[Sat Jan 13 14:59:24 2024]  ? search_exception_tables+0x33/0x50
[Sat Jan 13 14:59:24 2024]  ? search_module_extables+0x5/0x40
[Sat Jan 13 14:59:24 2024]  ? exc_page_fault+0x71/0x140
[Sat Jan 13 14:59:24 2024]  ? asm_exc_page_fault+0x22/0x30
[Sat Jan 13 14:59:24 2024]  ? jbd2_submit_inode_data+0x70/0xd0
[Sat Jan 13 14:59:24 2024]  ext4_fc_commit+0x29a/0x8d0
[Sat Jan 13 14:59:24 2024]  ? file_check_and_advance_wb_err+0x27/0xb0
[Sat Jan 13 14:59:24 2024]  ext4_sync_file+0xd8/0x340
[Sat Jan 13 14:59:24 2024]  __x64_sys_fsync+0x32/0x60
[Sat Jan 13 14:59:24 2024]  do_syscall_64+0x42/0x90
[Sat Jan 13 14:59:24 2024]  entry_SYSCALL_64_after_hwframe+0x62/0xcc
[Sat Jan 13 14:59:24 2024] RIP: 0033:0x4074ce
[Sat Jan 13 14:59:24 2024] Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48
[Sat Jan 13 14:59:24 2024] RSP: 002b:000000c00137a968 EFLAGS: 00000202 ORIG_RAX: 000000000000004a
[Sat Jan 13 14:59:24 2024] RAX: ffffffffffffffda RBX: 0000000000000077 RCX: 00000000004074ce
[Sat Jan 13 14:59:24 2024] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000077
[Sat Jan 13 14:59:24 2024] RBP: 000000c00137a9a8 R08: 0000000000000000 R09: 0000000000000000
[Sat Jan 13 14:59:24 2024] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[Sat Jan 13 14:59:24 2024] R13: 22aeffeeaeaeaaa0 R14: 000000c008bf5380 R15: 0000000000000090
[Sat Jan 13 14:59:24 2024]  </TASK>
[Sat Jan 13 14:59:24 2024] Modules linked in:
[Sat Jan 13 14:59:24 2024] CR2: 0000000000000080
[Sat Jan 13 14:59:24 2024] ---[ end trace fb6b44467200858f ]---
[Sat Jan 13 14:59:24 2024] RIP: 0010:jbd2_submit_inode_data+0x70/0xd0
[Sat Jan 13 14:59:24 2024] Code: 01 48 8b 77 20 85 c0 7f 4c 48 c7 44 24 08 00 00 00 00 48 8b 7e 30 48 89 e6 48 c7 44 24 20 00 00 00 00 c7 44 24 20 01 00 00 00 <48> 8b 87 80 00 00 00 48 01 c0 48 89 04 24 48 8b 43 30 48 89 44 24
[Sat Jan 13 14:59:24 2024] RSP: 0018:ffffc90000897dc0 EFLAGS: 00010246
[Sat Jan 13 14:59:24 2024] RAX: 0000000000000000 RBX: ffff8880045418c0 RCX: 0000000000000000
[Sat Jan 13 14:59:24 2024] RDX: 0000000000000001 RSI: ffffc90000897dc0 RDI: 0000000000000000
[Sat Jan 13 14:59:24 2024] RBP: ffffc90000897ed8 R08: 0000000000000006 R09: 0000000000000000
[Sat Jan 13 14:59:24 2024] R10: 0000000000000238 R11: ffffffffffffffff R12: ffffc90000897e98
[Sat Jan 13 14:59:24 2024] R13: ffff888013a9f958 R14: ffff888101725e80 R15: ffffc90000897e80
[Sat Jan 13 14:59:24 2024] FS:  00007efc8d685b38(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
[Sat Jan 13 14:59:24 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Jan 13 14:59:24 2024] CR2: 0000000000000080 CR3: 000000011a658004 CR4: 0000000000170ef0

zip · January 13, 2024, 9:52pm

Thank you.

The storagenode process isn’t killed as far as I can tell, at least not every time - the PID was still there I believe, at least in the first few cases I have experienced where I did check what exactly have happened, but it definitely gets stuck in a way it is not possible to shut it down or restart it gracefully.
The load averages also skyrocket when this happens, I guess because of iowait.
My dirty solution so far was to kill the whole VM and start it again as was not sure if this is something resumable without a reboot.
I also believe I did check the mount couple of times and it was still there, but listing it using ls was a no go, with ls getting stuck as well.
Will however disable the fast commit for now as advised.
Thank you.

Alexey · January 15, 2024, 3:45am

There can be several storagenode processes - the lazy filewalker for example is running as a separate process and it’s likely was killed. You may check that in the storagenode’s logs.

zip · January 15, 2024, 11:00am

I experienced another case today and the very last thing in the log was INFO servers slow shutdown, with what appears to be a dump of some sort.
6 hours later when I discovered the issue the storagenode process was still present in the process list.
The funny thing is it was on a node that had fast_commit disabled already, it however got stuck with the same Kernel error as before.
It would thus appear it isn’t enough to just disable the fast_commit, but remount is needed as well to not experience the problem again - at least I hope it won’t happen again, otherwise an ext4 to ext4 copy will be a new discipline for me for a next couple of weeks at least .