I am wondering why the output bandwidth is higher than the storage I actually pulled

asdfgdredscfvfsad · May 10, 2023, 9:33pm

I used Rclone from my VPS us S3 pulled one whole bucket that’s 26.1GB. And the dashboard’s output bandwidth shows 34.3GB output.That’s around 30% more than what I actually pulled.

Last month, I also pulled a bucket around 50GB, and bandwidth shows around 67.8GB output.

What’s funny is the ratio is all around 131%-133% of what did I actually get.

So what happened here, it my Rclone S3 problem or Storj problem?

Alexey · May 11, 2023, 5:11am

rclone has a retry option, so it may retry not downloaded parts/files.
Please check also the settled bandwidth, not allocated. Usually it allocate more bandwidth until it receives all confirmed orders from the nodes (24h).

jtolio · May 11, 2023, 1:15pm

Hi @asdfgdredscfvfsad! Your username approach is vastly superior to mine lol

Storj has a unique set of challenges (and solutions!) being a decentralized storage platform. One problem we have is that storage nodes across the board are highly variable in performance, but we don’t want our customer experience to be highly variable!

The way we solve this variability problem is we “overdownload” - we make a request for more pieces than we need. For any download we need 29 pieces of what you’ve requested, but we actually request 39 (which is just a little over 133%!) (there’s nothing special about these numbers other than that they came from our data science team’s model about network health and churn). Then, when the fastest 29 complete, we return the data. This allows us to be tolerant of an additional 10 nodes being slow.

We settled on this ratio because it seems to provide the best tradeoff between performance variability and costs (not just price), but it is an aspect of our system that might be surprising to someone who is used to downloading from one single location.

If we’ve made the wrong tradeoff decision for your use case, the client software can be tuned to choose a different “long tail tolerance”. On the one hand, a larger long tail tolerance uses more resources but is more tolerant to performance “long tails”, but on the other hand, a smaller long tail tolerance will use less resources and be less consistent, performance-wise.

I hope this explanation makes sense - it’s probably something we should do better explaining in the app. There is more information in our docs here: How Billing is Calculated - Storj DCS Docs

asdfgdredscfvfsad · May 12, 2023, 2:44am

Thank you! That explained a lot of things!

But could you please help me point out the documents for how to choose a different long tail tolerance? I have checked Download an Object for uplink cli docs, and Rclone Hosted Gateway docs, and both of them didn’t tell. I get a little bit lost.

P.S. Funny story about my username: a repeated username has broke my own program once (a mistake), from that on, I start just randomly knock my keyboard to generate usernames even I have fixed that mistake.

Alexey · May 12, 2023, 8:11am

You may use a --long-tail-margin option
See storj/cmd_cp.go at c2710cc78dc863b642aabb5c76844659c53f81b6 · storj/storj · GitHub

jtolio · May 12, 2023, 1:47pm

That actually is only for upload! It appears I misspoke and there isn’t an easy way to control that value for download that I can see. We’ll need to get this added.

Alexey · May 13, 2023, 1:06am

It would be nice to show it in the uplink cp --help command too.