Storage of extreme size files

snorkel · October 28, 2024, 9:45pm

1.Is there a minimum file size limit under which the division in pieces and erasure coding and all that can’t work?
I mean you can’t devide a 1Byte file into different pieces, can’t you?
How are the small size files stored?
Is there an inflation with useless data, that is removed on file recreation?
2.Is there a maximum size limit for files that can be stored on Storj network?

arrogantrabbit · October 28, 2024, 10:09pm

Small files are stored as part of metadata on a satellite.

A file may be small enough that it consists of only one segment. If that segment is
smaller than the metadata required to store it on the network, the data will be stored
inline with the metadata.² We call this an inline segment.

² The Linux file system Ext4 performs the same optimization with inline inodes [56].

[56] Tao Ma. ext4: Add inline data support. https://lwn.net/Articles/468678/, 2011.

Inline Segment An inline segment is a segment that is small enough where the data
it represents takes less space than the corresponding data a remote segment will
need to keep track of which nodes had the data. In these cases, the data is stored
“inline” instead of being stored on nodes.

It’s in the documentation.

Alexey · October 29, 2024, 4:37am

And as a consequence, it’s more expensive, thus we a have a segment fee to incentivize to use more big segments (64MiB or higher).
We also supports zip archives via linksharing, so if you still need a lot of small objects as a static content, you may pack them to a zip archive and upload to the bucket, then use a content from that archive.
For cases where you do not need to share a content it is better to use a specialized backup solutions instead of a simple sync.

striker43 · October 29, 2024, 7:45am

I‘m wondering if in all these performance comparisons with STROJ‘s competitors someone also tested and compared upload/download times of very small files that don’t land on nodes?

arrogantrabbit · October 29, 2024, 2:19pm

Small file performance will be atrocious with any remote storage service — duration of fix overhead drastically exceed time for payload transfer. It’s the same reason small file performance on HDD is four orders of magnitude worse than large file performance — seek time dominated on small files.

There is no point benchmarking that. It’s a discourageable usecase.

Alexey · October 30, 2024, 4:17am

You are correct, but now we processing small files better than competitors too.