ZFS fragmentation (mostly dbs)

New nodes are incredibly fast at first, but I’ve noticed things start to slow down after a while. Recently, I tried copying some database files to a different location, and the transfer of piece_expiration.db (1.5-2GB) was painfully slow, around 5-10 MB/s. The issue? Fragmentation.

My pools are currently 50-60% fragmented, and piece_expiration.db is a prime example. After moving the file (removing frag), I was able to restore transfer speeds back to 150-200 MB/s

How are you dealing with fragmentation issues, especially on large nodes?

My ZFS setup:

Special device on NVMe
Compression: on
Recordsize: 1M for storage, 64K for DBs
Atime: off
Sync: off

You’re statements make no sense to me… zfs is cow, copy on write, so a db would always be fragmented?
Maybe the wal saves it somehow

Specifically for that DB file… isn’t it going away soon in favor of expirations being tracked in regular flat files?

If you only notice degraded speeds during bulk maintenance tasks… but the node is speedy(as your special-metadata-device makes all filewalker/GC/trash operations fast) I wouldn’t worry about it.

If it bugs you… a periodic zfs export/import/rename will restore performance… but doesn’t seem to be worth the time.

2 Likes

I’m not, because fragmentation is not an issue.

  • copying databases is not a usecase that needs optimizing, databases are not accessed sequentially.
  • Database on the HDD are very slow, regardless of fragmentation level.
  • to make databases faster you need to get rid of HDD in the data path. One option is to have sufficient caching, another — force databases onto the SSD.
4 Likes

Missed this. So the solution for you is

zfs set special_small_blocks=64K pool/storagenode/databases

my databases are on SSD. the size needed isn’t that large so it isn’t a hardship.

How do you know?
The only fragmentation score I know, is about free space. Which is quite different from the used space.

Everything started when I noticed slowdowns in trash deleting in one of my ZFS pools (almost as slow as ext4). The 10TB node had 60% fragmentation (zpool get fragmentation). At that point, I tried moving all the databases to put them on the SSD, and I noticed that moving the 2GB piece_exp file was going at 5-10MB/s (rsync). That’s it. I thought fragmentation was causing that slowdown. I moved it for eliminate fragmentation and speed went back to normal. Normally I don’t have problem with db in the same hdd (different dataset to 64k). I will investigate and maybe will move everything outside (I want to keep ultra simply setups)

But if you haven’t actually measured where is the bulk of time spent, how could you decide it had anything to do with databases or fragmentation, or if this separate usecase of copying a database file is in any way representative?

1 Like

Translation of my first post:
“Hey guys! I noticed this thing. The transfer of a simple file was really slow… It seems that it’s really heavily fragmented. By moving it, I solved the problem. Could it be that the general slowness is due to excessive fragmentation of the whole pool? Have you had the same experience?”

General answers: No…

Me: Ok! I’m going to spend time on this… seems a specific problem