On tuning ext4 for storage nodes

Toyoo · March 15, 2024, 10:25pm

Yes. Use external journal and place it on your NVMe storage (assuming you have at least a half-decent NVMe device).

snorkel · March 15, 2024, 10:36pm

It’s a T-FORCE TM8FP7001T. I chose it for the very high wear cycle.
I don’t know how to do that. And if the NVME dies, the journal dies with it; can it affect the storage drives? If I replace the broken NVME and reinstall OS, can the journal be recreated without destroying the storage data?

Toyoo · March 15, 2024, 10:55pm

A journal is effectively only a very short term backup of some data that will land somewhere else on the file system anyway a few seconds later. It is never read from unless in case of recovering from an unclean shutdown, and a clean shutdown drops data in journal.

In case the NVMe fails while the HDD still operates, the file system can still shut down cleanly. Journal can be moved back into the main block device any time too.

Alexey · March 16, 2024, 3:23am

In my opinion, use gdisk to create a partition, then just use

sudo mkfs.ext4 /dev/sda1

or use a LVM first (to have several useful features without a huge performance impact), and format the logical volume with usual ext4 with default parameters.

Krawi · March 16, 2024, 3:49am

And add this parameter to mkfs:

-e remount-ro

And add this parameter to your fstab entry:

errors=remount-ro

This will turn the filesystem into read-only mode if any error with the filesystem occurs. If this happens stop the node, unmount the filesystem and run fsck on the filesystem to fix the error.

snorkel · March 16, 2024, 7:51am

Should I set in fstab “defauts,noatime 0 0” or “0 2”?
Should I let it run the filesystem check at boot?
Does it take long time to run it when the HDD will be almost full?
I tried the “0 2”, and after reboot, there was some intens activity on both drives like 5-10 seconds, and then just a continous “blinking” for minutes, without stopping, untill I changed to “0 0” and reboot.
Now is just the intens 5-10 sec boot activity on them.
I wonder if that was the fs scan, or some heads were just “unsettled”.

Alexey · March 16, 2024, 8:29am

Mitsos · March 16, 2024, 8:29am

the two last columns are “dump” (not used, so always zero) and filesystem check (0 = don’t check, 2 = check after checking all that have 1, which only root should be).

snorkel · March 16, 2024, 9:16am

Yep, I got that, but what would you recommend? Does the scan takes long?

Alexey · March 16, 2024, 10:12am

Perhaps it shouldn’t, unless you have an abruptly restarts.

Mitsos · March 16, 2024, 10:55am

Since most distros migrated to systemd, systemd decides to check the filesystems on boot if they are marked dirty (ie not cleanly unmounted). For a few (stupid) reasons, this sometimes fails, which drops the boot process into recovery (ie if your node is remote, you have to get there to fix it). I prefer the system to come back online as soon as possible, I remote into the system and stop nodes, unmount the filesystems and manually check them.

snorkel · March 16, 2024, 11:33am

Man, when you switch from a GUI OS, especialy Win, to a CLI one, like Ubuntu Server, you start learning so much stuff from under the hood.
I spent hours and hours reading and try understanding each command and each parameter, just to get the basic install done.