Ext4 Optimization recap

Small operators growing and one day they will realize that need to optimize entire system to survive.
Can we have a recap on suggested optimization on a EXT4 (more common filesystem in storj ecosystem) based server?


Why would you need to optimize the file system? As far as I can see, the file system is almost never the most constraining factor. In my case the download speed on average doesn’t exceed 1MiB/s, which also never saturated the disk cache.

Download speed is irrelevant. IOPS is what matters. 1Mbps of 1k files is at least 125 IOPS, even if you don’t count atime updates, metadata writes (mostly batched) but it adds up — hard drives cannot do more than 200, so even today it’s pushing it.

Therefore you definitely want to tune for the usecase — at least disable atime updates, disable sync (especially for databases, or move databases to SSD if you don’t have reliable power and have to rely on sync), maybe tune sector size, etc. but I’m not expert on ext4, let’s someone more knowledgeable chime in.

1 Like

I see you are volunteering. Great, I’d love to see someone collect all these nuggets of information! You can probably start from just typing ext4 into the forum’s search tool, I see at least two relevant threads in just the search preview.

My intention was to try to provide a general summary for everyone. As easy as possible. A defrag of forum info about fs managment :slight_smile:

Again, I doubt whether the filesystem itself poses a real problem. I have six nodes running, of which 2 rely on just one HDD. One pure SSD. And 3 with a mix of micro-SDs, HDDs and SSDs. I actually don’t see any substantial difference between all of them, all having a ingress of 30-50GB/day, with a grow of 15-20GB/day.

Yeah, please do!

Though, I’d hope this kind of information could be included in the official documentation as well, I feel that’s where it belongs to.

I’m thinking about my 24hrs for filewalker on a normal sata server hdd with 10tb of data reading at 10/15 MB/s… (ext4 filesystem. l2arc on zfs I understand that helps a lot)

Just made a dumb experiment.
Created 2 virtual disks, connected them to the Linux VM (Ubuntu 22.04)
Created lvm volume on the first virtual disk, formatted to ext4 and mounted to /mnt/lvm-test

sudo pvcreate /dev/sdc
sudo vgcreate vg /dev/sdc
sudo lvcreate -L 20G vg
sudo mkfs.ext4 /dev/mapper/vg-lvol0
sudo mkdir /mnt/lvm-test
sudo mount /dev/mapper/vg-lvol0 /mnt/lvm-test
sudo chown $(id -u) /mnt/lvm-test

Created zfs pool on the second virtual disk, created zvol, formatted to ext4 and mounted to /mnt/zfs-test

sudo zpool create test /dev/sdb
sudo zfs create -V 20G test/vg-ext4
sudo mkfs.ext4 /dev/zd0
sudo mkdir /mnt/zfs-test
sudo mount /dev/zd0 /mnt/zfs-test
sudo chown $(id -u) /mnt/zfs-test

Small dumb test

$ head /dev/urandom -c 1G > 1G.raw
$ rsync 1G.raw -P /mnt/zfs-test/
  1,073,741,824 100%  138.39MB/s    0:00:07 (xfr#1, to-chk=0/1)
$ rsync 1G.raw -P /mnt/lvm-test/
  1,073,741,824 100%  201.41MB/s    0:00:05 (xfr#1, to-chk=0/1)

So, if you want to have a faster disk - it should be without underlaying zfs. What’s the point to create zvol there? why not a native zfs dataset?
here is my test/plain zfs dataset:

$ rsync 1G.raw -P /test/plain/
  1,073,741,824 100%  203.82MB/s    0:00:05 (xfr#1, to-chk=0/1)


$ rm /mnt/zfs-test/1G.raw
$ rsync 1G.raw -P /mnt/zfs-test/
  1,073,741,824 100%  141.39MB/s    0:00:07 (xfr#1, to-chk=0/1)

I use a zvol because I run the node inside a VM and zfs is the underlying storage of the host. All VMs have their virtual disks as zvols.

For a raidz pool, you should create a zvol with 64K block size (the exact minimum differs, but to small block size on a raidz pool with 4K sector drives can make the zvol take up double the amount of space).

add another node to split the iops… problem solved…

Jup, but means also an additional drive and halfs the ingress per node. So probably doesn’t solve that much in the end, unless you already planned on starting more than one node.

Actually, I don’t think this experiment does say something about the use case of STORJ. Files are on average below 1MB, so not eligible for sequential writes. Also a lot more meta data writes.

I’m agree with you. But you can do such test yourself with multiple files, the result will not change: zvol with ext4 will be slower than lvm/mbr ext4 and slower than zfs dataset.
And also zfs dataset (on the simple single-disk pool) will be slower than lvm (single disk)/mbr ext4 for multiple files.

Are these limitations for ext4 on Synology NAS a problem for storagenodes? Is there a maximum alocated space recomanded for a node in these conditions?

I think these are huge problems, that can’t be overcome.
Especially, if you’re searching in previous posts you see many problems arose with people using ext4. Especially those, who used 200TiB volumes, symbolic links and those who tried to download 16TiB segments from storj. You won’t find any of those people and nodes still alive… :wink:

But really, why’re you asking if ext4 is one of the best recommended file systems. Opposed to for example fat, exfat or btrfs? See the many examples, if you were able to find the search function. Such like 1 and 2.


:grin: ᅟᅟᅟᅟᅟᅟᅟᅟᅟᅟᅟᅟᅟᅟᅟᅟᅟᅟᅟᅟᅟᅟᅟ