Tardigrade Size of file uploads

kevink · April 17, 2020, 4:40pm

As far as I understand it, each file is divided in pieces of 64MB and those pieces get uploaded to different nodes.
Very small files get stored on the satellites directly.

So what happens when I store my backups on tardigrade with multiple files in the range of 1-60MB?
Will those get stored on nodes or the satellite? What is the size that determines if a file gets stored on the satellite or on a storagenode?
Am I paying for 64MB or for the actual file size?

anon68609175 · April 17, 2020, 5:17pm

AFAIK satellites store files less than 4KB

Vadim · April 17, 2020, 6:05pm

64 mb is default maximum block size, if file is bigger then it wil be devided by this size.

SGC · April 17, 2020, 8:48pm

@kevink
you are right to question it, there is always some overhead related to files depending on the meta data they have, it can be a very complex subject… i doubt there is a huge overhead… but you will most likely need to consult some of the tardigrade documentation for a proper answer.

or wait until somebody wise and bored enough to give you an exact answer.

if i was to hazard a guess the system is smart enough that there is as little overhead as possible on the files, so you pay for exactly the amount of actual capacity they take, tho like with any file system there might be a minimum size of a file… most likely in the 4kb or 8kb range, but yeah consult tardigrade documentation.

i checked, but couldn’t find anything useful.
might not be an issue, unless if you are dealing with something like millions of files, then i doubt you will have to worry about it…
only kinda information close to being relevant.
https://documentation.tardigrade.io/concepts/cost-optimization

kevink · April 17, 2020, 8:55pm

Yeah the documentation says: “Users are not charged for the expansion factor when storing data they are only charged for the actual size of the file and nominal overhead related to encrypting it”
But that is about all I found.

It is logical to assume that I pay for the actual file size plus a small overhead.
However I was wondering about how files are split among satellite and nodes and what happens to a 4MB file (will it be stored as a 4MB file on nodes or as part of a 64MB piece, …?).

SGC · April 17, 2020, 9:21pm

think of it this way…
you have a block thats 64mb, inside it you can write whatever you want… which can be 1 file or multiple files, but until you use the 64 mb completely then its just a single block of 64mb on the servers, but a 64mb block isn’t always forced to be 64mb… it can be dynamic… so that it might be a minimum size of 1mb and max of 64mb, then if you put in a some files it will store the “file system” in the first block which it will then read first when you load your “folder” to see where everything is…
the files are then also added inside the 64mb block until it spills over into the next and the next and the next…
so if you try to retrive a certain file it will first read data in the first block to see where it is and then go to the related block for whatever you are trying to retrieve.

if you want to really go in depth about it, then its called object storage… and is not a storj technology but a very revolutionary network/SAN/DAS/RAID/cluster like technology.
i believe that is the same thing storj is basically using to help make sense of it all… and have a sort of guide book on how to make it all work a bit easier.

there are layers upon layers upon layers when using stuff like storj or even saving a file on your local harddrive… it gets very complicated and most of it most likely doesn’t have much relevance to what you are doing…

most likely it doesn’t matter, you will be charged by the space used, not by filesystem overhead, because in modern storage solutions those issues are near non existent.

but it would be a perfect viable question to pose to a storjling to give them a headache lol

BrightSilence · April 18, 2020, 9:22am

Ok, first some terminology. A file is split up in 64MB segments, which is then erasure encoded to 80 pieces. Those pieces would be roughly 2.3MB in size. I skipped a few steps in between as they are not relevant to this discussion.
So there’s never anything like a 64MB piece.

With that out of the way, uploading a 4MB file would not result in further split ups in segments. You just end up with a 4MB segment, that is erasure encoded and split into 80 pieces of a size of roughly 0.14MB.

As you know, you’re paying for storage of the original file size (after encryption) and for egress bandwidth. What’s a little less known is that there is also a per object fee. This is mostly intended to discourage people from uploading tons of miniscule size files. It’s so low that for almost anyone it can simply be ignored.

Per Object Fee - Charged at $0.0000022 per file stored. This charge ensures that users are incentivized to store data that is larger that is optimized for storage on the network. Decentralized cloud storage is best suited to the storage of large static objects.

Tardigrade stores object metadata on the satellites. There is no file system block or anything. There technically aren’t even folders. Tardigrade handles paths as file prefixes. So you don’t ask it what files are in a folder, you technically ask it which objects have these prefixes. It works largely the same as folders with some exceptions. You can’t have any empty folders, since by definition there can’t be a prefix without an object.

So you’re technically paying a per object fee for the metadata overhead. But it’s negligible. The more important part is that you would have pretty bad performance with many small files compared to fewer large files. And this cost structure incentivizes you to avoid that situation.

SGC · April 18, 2020, 10:37am

your answer makes me think of how much trouble i had with handling OpenNMS.
OpenNetworkMonitoringSystem or whatever its called… anyways it’s programmed in something like 200k to 400k files… moving that around is just hell… mainly because i was using regular hdds.

ended up using it as a disk performance test… because it was ridiculous to try and copy it between drives, mainly an issue with spinning drives… doubt even a first gen ssd would choke on it much…

good answer, very interesting.
i find a bit weird that didn’t seem to be clearly stated in the tardigrade documentation.
alas i may not have looked closely enough…

Alexey · April 18, 2020, 12:33pm

https://documentation.tardigrade.io/concepts/cost-optimization

kevink · April 18, 2020, 12:55pm

Thanks a lot for the detailed answer! I did forget about the role of the erasure code in splitting up segments.
It also explains why my node has mostly files ~2MB stored in blobs and almost no bigger ones.

kevink · April 18, 2020, 12:55pm

I did read that article but nowhere does it say how small a file has to be to be stored on the satellite instead of the storagenodes.

Alexey · April 18, 2020, 1:41pm

It’s about 4K. This is subject to change, if cite the Whitepaper:

Inline Segment An inline segment is a segment that is small enough where the data
it represents takes less space than the corresponding data a remote segment will
need to keep track of which nodes had the data. In these cases, the data is stored
“inline” instead of being stored on nodes.

So, if size of metadata would grow, this number could grow too.

kevink · April 18, 2020, 1:44pm

Thank you very much!