Storage of small files in the network

snorkel · June 14, 2025, 7:00am

As I understand it, small files are stored on Storj servers, because they are to small to be spread into pieces across the public network.
Can’t they be gattered together in some bigger files, like the logs we are using in hashstore, and spread those in pieces across the network?
This way, the central servers are freed from them, and the safety of the data is assured in the same way as with big files.
Is it a possibility?

arrogantrabbit · June 14, 2025, 7:06am

Look at it this way:

Small fiels are stored inline because overhead of fetching small amount of data from nodes is way too expensive; latency is much higher than time needed to transfer data.

You are suggesting to add time needed to download large log and unpack that small file to the process. it will be significantly worse.

I’d argue satellites are teh safest place to keep files. If Node dies – nothign happens. If satellite dies – everythign happens.

Satellites are redudand and distributed just like (if not better than) nodes.

jammerdan · June 14, 2025, 7:10am

Isn’t that suggestion what is in the roadmap?

github.com/storj/roadmap

Small File Packing (Satellite)

opened 04:31PM - 24 Jan 22 UTC

iglesiasbrandon

Satellite Team

### Summary: Currently storing small files (5 MB or less) is not healthy for t…he satellites. for each file stored on the network, the satellite must store some metadata. If the size of the object is small the amount of metadata does not change so the ratio of data stored to metadata stored is unhealthy. ### Pain Point: - For Each segment that is stored on the network the satellite must store some metadata. The to store the metadata on the satellite for a signal small file and a large (single segment) file is the same. So in terms of - Meta data in CRDB is very expensive. Small objects take as much metadata as large objects so we want to optimize this. - We can not scale the satellite DB without having a satellite solution to packing files so that the cost of the metadata relative to objects stays low. - Small files end up becoming small Pieces that are stored on the Storage Nodes. Small Pieces are bad for the Nodes on the network because they are not optimally stored on the hard drive. data on hard drives is stored in blocks; small pieces occupy an entire block even if they are smaller than a block. Nodes are only paid for the size of the piece they store not for the entire size of the block. ### Intended Outcome: - Backwards compatible for users - The satellite will have a process to pack small files uploaded by customers. ### How will it work? Blueprint: https://review.dev.storj.io/c/storj/storj/+/6543

snorkel · June 14, 2025, 7:11am

Ouh, didn’t saw that… or maybe I did, and a brain elf reminded me somehow…

Alexey · June 14, 2025, 7:11am

They can. But depends on the customer. Our link sharing service allows to have access to a single file from the zip archive and it’s pretty fast.
So, the main suggestion for the customers with a lot of small files to pack them to a zip archive and upload, then use them in their website for example without any issues and struggling.

The linked GitHub roadmap ticket is to do so automagically.

snorkel · June 14, 2025, 7:14am

I was thinking only about the Storj side, not customer’s. Like… the software could pack small files from different customers, not only one, and process those bigger files.
But AR pointed out the problems…

Alexey · June 14, 2025, 7:44am

No, it would become much worse. The linked ticket suggest to pack such small files with their metadata. Not sure that it’s possible to implement, though.

Toyoo · June 14, 2025, 1:26pm

Might be acceptable trade-off for some customers, so that they do not need to pay excessive per-segment fee.

Alexey · June 15, 2025, 4:50am

This per-segment fee is introduced not only to cover costs of inline segments but also to prevent the abuse: you may store a million of small objects with almost zero size, where metadata will be still greater than the object, or upload a 1kB object with 1 byte chunks, etc.