High IO delay Direct mount bind zfs


As my storge node start filling the io delay increased dramatically, im no longer able to sustainaly write to the drive before tanking performance.
My setup proxmox zfs
lxc direct mount bind + docker
4324

What can i do to improve performance?
Best regards

You’ve not provided much detail about how the zfs is setup, if you can provide more we might be able to help.

1 - number of hdd, and what ZFS configuration RaidZ ? RaidZ2 ? 3 ?
2 - have you got L2ARC configured on the ZFS pool ?
3 - what ashift did you use ?

Machine looks like a Intel Nuc ? memory is low ~53-57GB, you would benefit from going into the Bios and reducing the GPU memory from auto to 256MB, as you have a few GB’s of memory assigned which will never be used…

Rule of thumb for Memory is around 1GB per TB on ZFS + 1GB OS + LXC overheads + 10% spare

Quick fix to get things stable is apply a Network Rate Limit on the LXC, you can see it when you edit the Network - unfortunately its both directions, start at 30 Mb/s - might give you a chance to get things stable.

Might be worth checking to make sure you haven’t got a ZFS scrub running, that can tank performance.

1 Like
  1. Its 1 drive
  2. yes, default for proxmox
  3. standard for proxmox

dell optiplex

it was indeed running a scrub but that only did a bit, from 60% to 40%

im trying a 1mb/s data cap

That’s very low ! 30 is a good place to start, maybe 25, at 1 not much will work, including the Storj…

:headstone:

ok so just checking, you have 1 physical hard disk in the machine, and you have installed ZFS on it ?

If the answer is ‘yes’, then that is a big problem, ZFS needs atleast 3 disks, ideally 5 to get it’s performance + L2arc (ssd or NVME).

If it is a single disk, you are better formatting under disks as ext4, and letting proxmox mount it - you will loose the data unless you can move it somewhere ?

I’m guessing the node is new ? under 2 months old ? if so probably easier to start from scratch.

1 mb/s data cap didn’t do anything for the io delay Its 1 drive using zfs, im not using l2arc as cache. the node is 10 days old and has filled up to 3TB, I rather not lose my progress. Espicialy considering im not 100% sure that it indeed is zfs. Can i use 30 GB ssd as cache for zfs or use my truenas raid NFS thats striped, as l2arc

aslo i have gbit but my speed is capped at around 13 mb/s
is there a storj limitation

Is it possible you’re just experiencing higher IO wait simply because the node is receiving a lot of ingress from the recent tests?

I am running all my zfs nodes single disk. No L2ARC, no special device, not 1 GB RAM per TB.

No problems… :wink:

76
123

maybe but its not using any network trafic and it still degrades my performance significantly

On Proxmox ? on that processor ? with the same disk as OP ? :thinking:

Let them know what you did to make it work.

Not on Proxmox, not the same disks, not really the same is it.

I can only speak from a Proxmox point of view, that single disk config on ZFS is not good, as I said EXT4 is the preferred option - as the OP said they arn’t using Arc, or the ZFS feature set to really all that complexity and no gain…

I’ll leave it there, lets try and get their IO wait down…

wait, so you have allot of other stuff on there !

you can find the cause of the disk i/o.

drop to the proxmox cli

apt install atop

‘atop’

‘press d’

look at the bottom DSK% column, find the process using lots > 40% then look to the left and make a note of its PID.

press q top quit atop

use

ps -ef | grep id to find out the process using all the disk - so like ps -ef | grep 3252

… if you want post a screen grab of the atop with D, and something might jump out.


Config is as follow
12 TB single drive in proxmox configured as zfs, Then using a direct mount bind in a lxc, then docker
4324
proxmox is installed on a ssd

even if i limit the download speed to 0.1 mb/s io delay is huge as long as the docker container is running. The moment i stop Storagenode everything is normal
Note that im only using the drive for storj, i did do a samba write test that shows performance dropping quickly when the ram cache fills.

I am using zfs because it is the only option with TrueNAS Scale. In general single disk is a bad idea with zfs. But for storj anything else would be a waste of disk space and money.

If ext4 is available I recommend to use it.

1 Like

ok, you are memory constrained in Arc - did you add physical memory after installing proxmox ?

show me the contents of /etc/modprobe.d/zfs.conf

231

Yep, that’s not enough - since proxmox 8.0, maybe 8.1 can’t remember, at install time the ZFS arc is set to 10% of physical memory hard limit - the I/O tanks when there isn’t enough.

12TB disk, should have around 14GB set.

…gimmie a sec just thinking

Also note that when the container starts and zfs cache is empty the io delay still increases to 30% instantly, I tried multiple times with rebooting, If the container however is not used i can easily write 100mb/s with samba for long durations. It does not matter how much data is written, E.G data cap of 1mb/s will still result in high IO/latency issues

this is how it looks when i block the connection


wait you only have 16gb of memory.

don’t take my word for it, try this https://pve.proxmox.com/wiki/ZFS_on_Linux

scroll down, theres the bit about memory requirements…

Allocating enough memory for the ARC is crucial for IO performance, so reduce it with caution. As a general rule of thumb, allocate at least 2 GiB Base + 1 GiB/TiB-Storage

Was going to get you to do ;

echo "$[14 * 1024*1024*1024]" >/sys/module/zfs/parameters/zfs_arc_max

I don’t know what to suggest…

hmm, wait…

So when you start the Container, it sounds like you have the filewalker enabled on start-up for Storj, that uses lots of disk.

try…

find the config.yaml file in the storage node directory.

look for this line, if you havent got it add;

# if set to true, all pieces disk usage is recalculated on startup
# storage2.piece-scan-on-startup: true

and remove the # and change to false

# if set to true, all pieces disk usage is recalculated on startup
storage2.piece-scan-on-startup: false

restart the storage node… does that help ?

1 Like

it helps for a few mins but then it still goes down a lot


for max performance:

zfs set sync=disabled poolname
zfs set atime=off poolname
zfs set xattr=sa poolname
echo "options zfs zfs_txg_timeout=30" >> /etc/modprobe.d/zfs.conf
update-initramfs -u
2 Likes