ZFS reporting much more HDD space used

skookum · February 19, 2022, 1:06pm

I’m not a novice with ZFS but I don’t understand why the amount of HDD space that ZFS reports as used is so far off from the Storj Dashboard indications.

Any other ZFS-backed Storj users that can enlighten me?

The dashboard reports 13 GB used, 3.7 GB trash.

ZFS reporting:

NAME        AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD  LUSED
tank/storj  1.98T  21.9G        0B   21.9G             0B      3.71M  35.4G

I’m really enjoying getting started with Storj. That terabyte was just sitting there unused anyhow!

Alexey · February 19, 2022, 1:55pm

It could be related to sector size, used in zpool.
The dashboard calculates used space by system function, the similar results you can get using du -S for the blobs folder in the storage location. The size of databases is not included.

littleskunk · February 19, 2022, 4:10pm

If you enable compression this overhead should get reduced.

skookum · February 19, 2022, 4:20pm

Compression is already on (but wasn’t always). Perhaps the first 5 GB was uncompressed. I was surprised to see any effective compression since the data is encrypted but it’s actually fairly good:

NAME            PROPERTY       VALUE  SOURCE
tank/storj      compressratio  1.62x  -
tank/storj/dbs  compressratio  1.79x  -

@Alexey my ashift and sector size shouldn’t be an issue, ashift=13 and the drives have 4k sectors.

littleskunk · February 19, 2022, 4:33pm

It is not the data that gets compressed. The compressed data is mostly coming from padding to hit the sector size.

skookum · February 19, 2022, 4:47pm

Ah, that makes sense. I’ve debated what to set for recordsize based on other threads about using ZFS with Storj. Several mention the average file size being approximately 2MB on their nodes but it’s closer to 260K on mine for the time being.

Space isn’t a worry for me as I am far away from the 1TB, it’s just out of a tinkering interest that I ask.

SGC · February 19, 2022, 6:52pm

err ashift=13 is 8k block size… 4k is ashift=12
ashift=9 is 512B and then it doubles with every step.
so 10 is 1k, 11 is 2k, 12 is 4k… so on and so forth

many of the files stored are 4k or 1k, which explain your high capacity used, you should remake the pool, because ashift cannot be changed on an existing pool.

do remember to move your storagenode before you destroy the pool

on recordsize i would recommend 256k, but most works so long as you don’t go to low…, i wouldn’t stray to far from the default 128k because it’s a very good general purpose choice.

and zfs records size is dynamical… so the set recordsize is more of a max record / blocksize
i have tested with out to 2MB but it does weird stuff to the ram and caches due to how caches and ram data is dropped for incoming data, which is based upon the max possible recordsizes.

you can also go all the way to 16MB recordsizes… but yeah thats requires you to disable some failsafe stuff because it can really mess up your system.

skookum · February 20, 2022, 12:32am

The ashift choice was deliberate for a little bit of “future proof” but yes, I’m aware that the choice does cost me some additional space on 4k drives. The Storj dataset is about 10% of the zpool and I don’t see nearly the same effects on the other datasets. What you said about file sizes makes me realize now that is the likely cause.

I’ll just absorb the additional storage space attributed to Storj; it won’t make or break the system.

Thanks for the responses, I learned from you about the record padding and small file size effects on the pool, which is useful knowledge beyond Storj.