From reading up on StorJ, an expansion factor of 2.75 is given as the amount extra data that one has to upload to cover the extra data that comes with the erasure coding. I would consider this fact a little hidden, as you really have to dig through the documentation to find it. However, even if you do find it, that is far from the truth of how much overhead comes from the distributed nature of StorJ.
In my testing, on upload, I’m seeing between 4 and 5x more data being sent than the size of the files themselves (as in, uploading a 100meg file will result in 400 - 500MB uploaded). From searching this forum, I understand that this is because StorJ initiates more uploads to storage nodes than is necessary, and then kills the extra uploads as soon as the target upload number is reached. So worst case scenario, all uploads would finish at nearly the same time, resulting in a killed connection that had already transferred a majority of its chunk.
I think I read that this is done to mitigate “tail end” latency, or something of the sort? Preventing the situation where I need to upload 80 chunks, I’m uploading to 80 nodes, but a single node has an issue, and my upload is left hanging. By “over uploading”, a problematic node doesn’t matter because I have other nodes already on the go.
This same technique is used for downloading, where more chunks than required are downloaded from storage nodes, and as soon as I get the required 29, any not-yet-complete downloads are just killed. This also leads to more data being transferred than is necessary - 1.3x in my test.
Ok, so, one of the benefits I believed StorJ offered was S3-competitve download speeds, in part due to the globally distributed nature of the storage nodes, but that hasn’t been my experience at all. For instance, downloading a 1.4GB file from Google Storage (Standard, regional) I average 63.2MB/s. That same file downloaded from StorJ averaged 22.7MB/s. I thought I might be CPU limited, as I saw rclone (used for both tests, v1.55.1) spiking to 65% CPU used (another drawback to StorJ). The Google Cloud region was Oregon, so I used a VM in Amsterdam thinking this would give StorJ an edge, but the results were nearly identical.
Looking at StorJ’s “Common Use Cases”, pretty much any where performance is mentioned I have to wonder how accurate StorJ is being. For instance, for “Large File Transfer” it says “High-throughput bandwidth takes advantage of parallelism for rapid transit”. But, really though? Sure, it’s very parallelized, but you need to send 4 - 5x more data than if you used S3 or similar.
I’m curious what other people think about this. I feel this overhead is quite hidden, and could lead to lots of disappointment from new users. I also wonder whether StorJ will have to consider allowing users to specify their own level of durability and performance. I’m convinced that files stored on StorJ are incredibly safe (satellites aside). Maybe that is worth a 4-5x overhead to one customer. Whereas, maybe another really does want to take advantage of large file transfer speed, and they can be allowed to configure their client to upload only the bare minimum number of chunks, understanding the risk they’re taking.
Thanks for reading. I think what’s happening here is really cool, and for whatever it’s worth, I’ve moved all my online backups to StorJ - so I’m not just here to be a hater.