On tuning ext4 for storage nodes

My understanding was that ext4 stored file entries in a directory as references on the directory inode to the inodes of the files and the names. And when it becomes full it referenced another inode of references.

If extra directory entries are stored on full blocks, then that reference is wrong and directory fragmentation can hardly be improved :sweat_smile:.

I’m currently testing with 128 too

is is really matter?
how much bandwidth your node has?

Final considaration after 1 year of testing? Is it safe to format ext4 at 128 to gain about 30% of performance?

In Windows this can be done with primocache, for read-only caching: without UPS.
ssd/nvme of 32gb+ to serveral hundred gb depending of the size of the node.
mine uses 21GB of 500 for a 1TB node.

The moment you would enable writecaching (regardles if over ssd, nvme or ram) you need an UPS fitting to your system, to prevent data loss.

2023-10-31 14:57:15.718 INFO: dbg_print_file_counters: tiny … < 10 KB: 1407945
2023-10-31 14:57:15.718 INFO: dbg_print_file_counters: small … < 100 KB: 5359171
2023-10-31 14:57:15.718 INFO: dbg_print_file_counters: average … < 1 MB: 972818
2023-10-31 14:57:15.718 INFO: dbg_print_file_counters: big … < 16 MB: 331316

The 331316 are propably all 2.4MB big storj files.

Well, it still works for me. But we know now that this is not always 30%. Smaller inodes mean less memory overhead to keep them, which can make a huge difference if your amount of RAM is between 0.5 and 1 GB per TB of stored data. For this ratio with standard inodes you pretty much need SSD for decent operation, while with smaller inodes you can easily get by without.

On the other hand, if you still are above 1 GB per 1 TB of stored data, you will barely see difference, and only in the first file walker that hits the drive.

1 Like

I moved my db-es on USB stick (Synology DS220+), and after checking it with mount, I see that it has set relatime. How can I change it to noatime? Should I stop the node before applying changes?
Is there anything else that I can do to prolong stick’s life?

mount

/dev/mapper/vg1-volume_1 on /volume1 type ext4 (rw,nodev,noatime,synoacl,data=ordered,jqfmt=vfsv0,usrjquota=aquota.user,grpjquota=aquota.group)
/dev/mapper/vg2-volume_2 on /volume2 type ext4 (rw,nodev,noatime,synoacl,data=ordered,jqfmt=vfsv0,usrjquota=aquota.user,grpjquota=aquota.group)
/dev/usb1p1 on /volumeUSB1/usbshare type ext4 (rw,relatime,synoacl,data=ordered)

In vi /etc/fstab I see this:

none /proc proc defaults 0 0
/dev/root / ext4 defaults 1 1
/dev/vg1/volume_1 /volume1 ext4 usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,synoacl,noatime,nodev 0 0
/dev/vg2/volume_2 /volume2 ext4 usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,synoacl,noatime,nodev 0 0

Can I add this?

/dev/usb1p1 /volumeUSB1/usbshare ext4 synoacl,noatime 0 0

or this?

/dev/usb1p1 /volumeUSB1/usbshare ext4 synoacl,noatime

I put this in vi /etc/fstab:

/dev/usb1p1 /volumeUSB1/usbshare ext4 synoacl,noatime 0 0

saved it with :wq, reopened it with vi to be sure it’s there and quit with :q!.
I rebooted the Synology DS and the /etc/fstab dosen’t has that line anymore.
The mount command shows the relatime still.
What to do next? How can I make it permanent?

DSM dosen’t have this option for USB drives.
I found a command on some forum:

mount -o remount,noatime "mount_point"

where mount_point should be which part of /dev/usb1p1 /volumeUSB1/usbshare?
Should I put it in a startup script? How the script will look like?

That command I found it in this script:

#!/usr/bin/env bash
# Author: Jan Christoph Uhde (2021)

# Variables that control this script ##########################################
rcl_noatime_volumes="1 2"
# Variables that control this script - END ####################################
## helper
rcl_error() {
    echo "ERROR (rc.local): $*"
}
## functions
remount_noatime() {
    # remount volumes noatime
    for rcl_i in $rcl_noatime_volumes; do
        rcl_mount_point="/volume$rcl_i"
        if [ -d "$rcl_mount_point" ]; then
            mount -o remount,noatime "$rcl_mount_point" || rcl_error "failed to remount $rcl_mount_point noatime"
        fi
    done
}
## main (calling the functions)
remount_noatime

Got it!
I put it in the start up script, as root, at boot:
mount -o remount,noatime "/volumeUSB1/usbshare"

So my startup script for DSM is:

sysctl -w net.core.rmem_max=2500000
sysctl -w net.ipv4.tcp_fastopen=3
mount -o remount,noatime "/volumeUSB1/usbshare"

You may copy the line from /etc/mtab and replace relatime to noatime, but I think this one

should work too.

On reboot, Synology recreates fstab and removes whatever I put in there. But the mount script works.
I was wondering, if fstab is recreated on each boot, from where does Syno takes the values?
Because the values I set in DSM, like noatime for HDD, are kept from boot to boot.
Maybe if I can find that file with those settings, I can add there the line for USB too, or change the one that exists, and fstab will be recreated accordingly.
Is the mtab the one that I’m looking for? Can I edit it?

unlikely. It’s just a current mounting.

I set up a machine only for Storj and I have some questions regarding the filesystem options you mentioned in the first post. Take note that I’m a total noob in linux and filesystems.
Specs: Intel N100, 32GB RAM, 1x NVME 1TB for OS, 2x Exos 22TB for 2x Storj nodes, Ubuntu Server fresh install.
The Exos drives - fastformat to 4Kn and set up the GPT partition on each.
The devices are /dev/sda and /dev/sdb.

sudo fdisk /dev/sda
m
g
w
Disk /dev/sda: 20.01 TiB, 22000969973760 bytes, 5371330560 sectors
Disk model: ST22000NM001E-3H
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: xxx
First LBA: 256
Last LBA: 5371330554
Alternative LBA: 5371330559
Partition entries LBA: 2
Allocated partition entries: 128

Now, I have to setup the filesystem on each drive. I see this in manual:

 -J journal-options
    Create the ext3 journal using options specified on the command-line.  Journal options are comma  separated,
    and may take an argument using the equals ('=')  sign.  The following journal options are supported:
        
    size=journal-size
    Create  an  internal  journal  (i.e.,  stored  inside  the  file  system)  of size journal-size
    megabytes.  The size of the journal must be at least 1024 file system blocks (i.e., 1MB if  us‐
    ing  1k  blocks,  4MB if using 4k blocks, etc.)  and may be no more than 10,240,000 file system
    blocks or half the total file system size (whichever is smaller)

Can I use this command on my drives too?

sudo mke2fs -t ext4 -m 0 -i 65536 -I 128 -J size=128 -O sparse_super2 -L Storj1 /dev/sda

or should I use -J size=4MB?
Any modifications you recommend?

You shouldn’t change anything related to journal. The journal is there to protect the filesystem in case of a sudden power loss. In simple terms the journal contains “I’m going to write/change/delete this data to that file”. If you lose power, those changes can be recovered using the journal. If you don’t have those changes handy, the filesystem will freak out because it now contains essentially corrupted data.

1 Like

So, what would be the command to creat the proper ext4 file system for my setup, without deviating from the standards?
I read all sorts of guides, and the big majority are outdated, since years ago, when there weren’t such big drives, and nobody cared about Storj usecase.

Just stick to the recommendation in the first post.

Besides, it’s really not a point where you will get big improvements.

1 Like
1) mkfs.ext4 /dev/yourpartition              (create filesystem)
2) tune2fs -m0 /dev/yourpartition           (remove reserved blocks for root)

The defaults are perfectly fine. As mentioned, mount it with noatime (first reply).

3 Likes

Browsing through internet, I read some guide about fdisk… and I believe I missed a step?!
There it sais: set GPT partition table with g (which I did), and then create new partition with n (which I didn’t). Other guides just sais creat new partition with n. And I don’t see a consensus about the type of new partition.
When I tryed new partition and Enter Enter Enter… it was creating a Linux partition of 16TiB, even though I have 20TiB, so I quit.

So, I set GPT partition table with g, and I have the /dev/sda GPT partition.
Now should I creat the partition /dev/sda1 with “fdisk /dev/sda”, n ?
What should I choose?

  • primary…?
  • number…?
  • beging and end?
  • type?
  • fs?
    Thanks alot!

Here be Dragons: this WILL blow up a disk if you don’t know what you are doing. Make sure you are operating on the correct disk.

  1. Don’t use fdisk. use gdisk /dev/device.
  2. o (create new GUID partition table (GPT))
  3. n (create new partition)
  4. guides you through the partition number (since the disk will be dedicated to storj, there will only be one partition)
  5. the start (default is ok, don’t change it otherwise you’ll end up with miss-aligned partitions, ie NOT aligned to physical sectors and more than one sector might be required to be read to get one sector worth of data)
  6. end default ok (uses all disk)
  7. partition type is 8300 (linux filesystem, should be the default. if it’s not, change it with “t” and “1” (type of partition 1)
  8. if you haven’t made any errors, you can write the changes with w (write changes to disk. If you made a mistake and want to start over, DON’T use w, instead use q (quit without saving)

You now have a disk that has a partition table and a partition entry in that table (ie partition 1 starts on sector 2048 and ends on somethingsomething). You need to create a filesystem on that partition (ie follow instructions above for EXT4), then mount the drive to be used (again, first reply to topic).

2 Likes