Node using a lot of extra diskspace (ZFS)

Just a small update, I decided not to rebuild the dataset, as the situation should only improve, atleast partially, as data is written/trashed.

Results in only 1 week:

3 may:

Total: 4.000 TB Used: 2.811 TB Trash: 49.02 GB
storage/STORJ 9.7T 5.3T 4.5T 55% /storage/STORJ

12 may:

Total: 4.000 TB Used: 3.021 TB Trash: 45.94 GB
storage/STORJ 9.7T 5.3T 4.5T 55% /storage/STORJ

On closer inspection: 210GB data came in, while used disk space increased around 10GB :slight_smile:

Compressionratio is shooting up from 0:

storage/STORJ compressratio 1.10x -

Is it possible that storj is (partially) rewriting data, while doing its routine jobs?

i think atleast two things are happening here or might beā€¦

  1. the space which was considered used might be considered emptyā€¦
    and
  2. storjā€™s files seems to grow in size over timeā€¦ and thus might be rewritten.

zfs is designed for everything to happen on the fly, so in most cases one shouldnā€™t need to take the pool down, so long as it isnā€™t in a catastrophic stateā€¦
itā€™s also possible that zfs might rewrite the files if it is working with them anywaysā€¦
zfs does all kinds of stuff that is very difficult to explain.

another thing is that storj data gets deleted, so these files will also be in many cases mostly empty space which freed up and when new files are written with compression on, they will take much less spaceā€¦

not because the files are being compressed, but because the empty space in a file isnā€™t written on the disk, because that can be compressedā€¦
i usually run ZLE for storj dataā€¦ but doesnā€™t seem to be a major difference so long as one doesnā€™t go crazy and use like pkzip-9 or whatever the parameter is.
ZLE will have a lowest foot print on the cpu, but LZ4 and ZSTD-1 will have the lowest memory foot print.

i think the default compression in ZFS today is ZSTD-3, but i donā€™t have a ton of cpu to work with so i run ZSTD-1 because the numbers make sense, its about the same workload as LZ4 and about the same performance aside from in some cases x5 better compression.

ZLE is Zero Length Encodingā€¦ basically it writes 0x64 instead of 00000000000000000000000000000000000000000000000000000000000000ā€¦
to put it plainly.

one can really get into the weeds on this stuffā€¦ LZ4 also has a very interesting way of operating, which is why it can compress so fast, but isnā€™t super good at itā€¦ but its a great trade off between work vs effective compression and only recently got replaced by the better compression scheme.

there is quickly diminishing returns on compression.
but apparently it also works the other way, one can save so much initially that not using compression is a mistake.

It might have to do with the way transfersh works. It uploads data with an expiration date. So you might get 210 GB new data and at the same time 200 GB from a few weeks ago expires and gets deleted. Expired data is not moved into trash.

1 Like

yeah like 30-50% of ingress doesnā€™t seem to stick around
:smiley: