I was curious because my zfs storj data folders showing some unexpected compressibility. So I checked my piece files and found 256 zero bytes at the end of all files. What could be the reason for this empty space?
There should not be 256 zero bytes at the end of all files. Something is quite broken. I can’t think of any misconfiguration or bug that would cause such a thing, though.
Are you sure it’s all piece files?
Even the oldest ones, and even the newest ones? Piece files for different satellites?
What about other files on the same volume, or in the same directory if any? Do they have the mysterious zero byte endings as well?
Finally, can you try to run the is-valid-sj1-blob utility on the pieces? If you have Go installed, it should work as go run storj.io/storj/cmd/tools/is-valid-sj1-blob@latest <piecefile>
. Or if you don’t have Go installed and don’t want to install it, we can send you an executable if you let us know what platform and architecture to build it for.
You might also try running is-valid-sj1-blob
with the -find-length
option. If your piece files are valid but have extra bytes on the end, this will determine that.
Checked 3 random files on my nodes, all of them had these zero bytes as well.
Might be related:
I checked some old files from 2021 and found the same zero bytes. Also I noticed it is not always 256 zero bytes. Some files having the next-to-last byte set to some other value.
I’m very interested now in the is-valid-sj1-blob output.
Ok, I ran some tests in a Storj test setup, and I do indeed see about 20-25% of the piece files end in 256 zero bytes (with the next-to-last byte sometimes being different). The piece files validate, though; they have the correct hash and size. So it’s something happening within the encoding step, and (assuming it’s the same cause for you) not something you need to worry about.
I’m not sure why you would see all files with this zero-padding.
It’s weird, though. It’s not plain encryption padding; AES-GCM is a stream cipher and doesn’t need padding. And I don’t recall there being a need for that much padding in the forward error correction output. I’ll ask some people about this after the weekend.
Ok, yes, we do add enough padding after encryption that the output shares are always ShareSize bytes (since ShareSize is currently 256 for all objects, we need to add up to 29*256=7424 bytes of padding) before FEC encoding. When a piece’s index is less than k=29, and the piece comes from the last segment of an object, there is a good chance that part of that padding will show up in the last share in the piece. You can’t see the piece index for pieces on your node, directly, but 20-25% is a reasonable estimate for how often that will happen.
So it’s normal to have a significant fraction of your pieces end in 256 zeroes (with possibly a few different bytes at the end). It’s not normal, as far as I can tell, for all pieces to end with that.
After doing some scripted checks I can confirm this estimate. Only like 25% of files having 256 zeroes. Most of the remaining files having the next-to-last byte none zero and a small number of files having more none zero bytes within the last 256.
I’m still a bit worried. It still sounds like nearly all of your piece files have 254-256 zero bytes at the end.
Most of your files should not end in zeros at all (or, well, one zero byte on average out of the last 256).
Overall 99.7 percent of last 256 bytes is zeros.