Node using a lot of extra diskspace (ZFS)

idfxken · May 12, 2022, 10:18am

Just a small update, I decided not to rebuild the dataset, as the situation should only improve, atleast partially, as data is written/trashed.

Results in only 1 week:

3 may:

Total: 4.000 TB Used: 2.811 TB Trash: 49.02 GB
storage/STORJ 9.7T 5.3T 4.5T 55% /storage/STORJ

12 may:

Total: 4.000 TB Used: 3.021 TB Trash: 45.94 GB
storage/STORJ 9.7T 5.3T 4.5T 55% /storage/STORJ

On closer inspection: 210GB data came in, while used disk space increased around 10GB

Compressionratio is shooting up from 0:

storage/STORJ compressratio 1.10x -

Is it possible that storj is (partially) rewriting data, while doing its routine jobs?

SGC · May 12, 2022, 6:04pm

i think atleast two things are happening here or might be…

the space which was considered used might be considered empty…
and
storj’s files seems to grow in size over time… and thus might be rewritten.

zfs is designed for everything to happen on the fly, so in most cases one shouldn’t need to take the pool down, so long as it isn’t in a catastrophic state…
it’s also possible that zfs might rewrite the files if it is working with them anyways…
zfs does all kinds of stuff that is very difficult to explain.

another thing is that storj data gets deleted, so these files will also be in many cases mostly empty space which freed up and when new files are written with compression on, they will take much less space…

not because the files are being compressed, but because the empty space in a file isn’t written on the disk, because that can be compressed…
i usually run ZLE for storj data… but doesn’t seem to be a major difference so long as one doesn’t go crazy and use like pkzip-9 or whatever the parameter is.
ZLE will have a lowest foot print on the cpu, but LZ4 and ZSTD-1 will have the lowest memory foot print.

i think the default compression in ZFS today is ZSTD-3, but i don’t have a ton of cpu to work with so i run ZSTD-1 because the numbers make sense, its about the same workload as LZ4 and about the same performance aside from in some cases x5 better compression.

ZLE is Zero Length Encoding… basically it writes 0x64 instead of 00000000000000000000000000000000000000000000000000000000000000…
to put it plainly.

one can really get into the weeds on this stuff… LZ4 also has a very interesting way of operating, which is why it can compress so fast, but isn’t super good at it… but its a great trade off between work vs effective compression and only recently got replaced by the better compression scheme.

there is quickly diminishing returns on compression.
but apparently it also works the other way, one can save so much initially that not using compression is a mistake.

littleskunk · May 12, 2022, 8:32pm

It might have to do with the way transfersh works. It uploads data with an expiration date. So you might get 210 GB new data and at the same time 200 GB from a few weeks ago expires and gets deleted. Expired data is not moved into trash.

SGC · May 13, 2022, 6:14am

yeah like 30-50% of ingress doesn’t seem to stick around

foegra · November 25, 2024, 12:17am

Hi, guys. I’ve been google around about the same ropic and noticed that pools default write size is 128kb. Many storj blob files are smaller then that. So I’ve modified that to 4kb and observing currently.

What’s your opinion here?

mattventura · November 25, 2024, 2:57am

ZFS won’t actually take up the entire recordsize for something smaller than a single record, but it will take up a multiple of record size for anything larger than a single record. i.e. with 128k record size, a 10k file will take up 10k, but a 130k file will take up 256k. So it’s correct to say that a smaller record size can help. However, you can also lift the “round up to next recordsize multiple” limitation by enabling any form of compression.

Alexey · November 25, 2024, 6:57am

Hello @foegra,
Welcome to the forum!

Usually people just enables compression, so the space wouldn’t be wasted for small files due to a 128KiB record size.

Ottetal · November 25, 2024, 7:25am

Welcome to the community friend

foegra · November 25, 2024, 8:26am

Hi again!
Thanks for greeting and Your reply.

@mattventura - that was comprehensive explanation.

Interesting. I’ve had compression enabled from the very beginning (LZ4 is the default option in TrueNas Scale 24.10) - at the moment STORJ node shows 670GB filled space, but actually place taken is 777GB. So now difference is 107GB.

Yesterday, before I decreased write size from 128KB to 4KB - difference was 110GB.
Until then - difference was increasing, but i have no evidence to prove it.

I’m beginner with TrueNas, just my observations. If the difference in occupied data is not due ZFS - what else could it be then? What amount of difference is considered OK?

EasyRhino · November 25, 2024, 3:30pm

regarding the initial topic, I also enable compression on my zfs disks. the compression ratio on a storj drive is only 1.08 ,so % isn’t too exciting but it’s nice to know space isn’t being wasted.

@foegra there are a variety of ways that the storj dashboard, both by design or bugs . a go - to option is to restart the node and give it hours or even days (if multi terabyte) to finish running a used space filewalker to account for all the pieces and update the dashboard.

foegra · November 25, 2024, 8:15pm

Will see how it goes.
In any case - 1.08 is better then 1
Is there an option to force the node to go through the files check it’s content, integrity etc…?

Alexey · November 26, 2024, 6:26am

No. But the restart would calculate the usage and update the databases.
The satellites are auditing your node. So, if something corrupted or missing, your audits score would go down.

foegra · November 26, 2024, 7:07am

Does it remove unnecessary files as well?
After restart my node size went down from 630gb to 580gb, but actual used space on hard drive did not become smaller.

foegra · November 26, 2024, 10:47am

In my case probably problem started when I’ve moved existing STORJ node while running from one disk to another by replacing disks from TrueNas Scale menu. I have restarted the node from scratch, otherwise every time i restarted the node, node size went back to 580GB.

edortaprz · November 26, 2024, 3:00pm

This is a known issue.

Still not recognized as a bug but, we will see…

foegra · November 26, 2024, 3:14pm

Do You mean known issue - breaking Storj node by replacing pool disk while node itself is running?

edortaprz · November 26, 2024, 3:27pm

No.

After rebooting, data lost until last reboot.

foegra · November 26, 2024, 3:50pm

All right, i see.
Then i should not have recreated the whole node from scratch. I confirm - rebooted the node and used space went down to the value it was before previous reboot.

Question - is this only UI issue or it affects the payment calculation as well?

Alexey · November 27, 2024, 3:32am

This is perhaps related to this bug:

it’s not lost, the stat of usage become broken, doesn’t affect the actual usage.

It’s UI, but if it would think that your node is full accordingly this stat, it would stop receive any new data, even if actually it has a free space.

edortaprz · November 27, 2024, 3:37am

Blockquote it’s not lost, the stat of usage become broken, doesn’t affect the actual usage.

Sorry, better explained as data still in hdd as you said.