To validate, I also restored a very large website with millions of files, and the results were matching to the above, the ratios were the same give or take.
STORJ is a centralized S3 with decentralized nodes as storage backbone.
So basically they have an S3 with slightly cheaper storage, but that is not what is expensive about S3.
Yeah, not surprising.
Apparently the native integration with some tunings is “fast”.
Storj works best for large transfers; apps that package data into reasonably sized blobs works fast. I don’t know how comet works, but if they just copy data as is without packaging – it will be ridiculously slow on any backend, but on STORJ – especially so, due to higher front latency.
The native integration is much faster than S3, but requires stable and high performance connection (not in terms of bandwidth, but latency and IOPS)
S3 is centralized and its performance its performance depends on only two factors, how much you are willing to pay for peering and where you are trying to access data. Storage speed is not the bottleneck.
So if you use it from home, S3 will be slow, because STORJ has bad S3 peering.
That is what @Slewrate tested here. I don’t know what ISP @Slewrate uses, but for his ISP (and many other ISPs according to other users and my testing) S3 performance from STORJ is pretty bad.
If you use a DigitalOcean VPS and their S3, it will be faster than STORJ because it sits right next to it and thous will have perfect peering. So STORJ not having compute is a downside. STORJ will never be able to offer 100GB/s like Amazon does. But that is manageable, Backblaze and others also do not offer compute.
STORJ having bad S3 performance (and being very expensive) is not even up to debate. Even STORJ employees agree.
That is why in every performance discussion we have in the forum, they will try to nudge you to the native integration. That is not a critique! This is just how the technology works and STORJ will never be able to change that. That is why they have to ditch S3 sooner or later.
Even if @IsThisOn could be correct about peering, but in case of Storj the S3 Gateway is a distributed service too, so when you use GatewayMT with a global endpoint gateway.storjshare.io, you will communicate with the closest instance for your location (the same would happen for the clients in different location).
The speed would be roughly the same in almost any locations, but of course to have a stable speed you need to use a native integration.
The main case here - you should use bigger chunks (preferably not less than 64MiB), and the wide bandwidth channel. If your upstream or downstream are low (less than 40Mbit and up to 1Gbit, YMMV), then likely S3 integration would be faster (need testing for your location), after 1Gbit the native likely will be faster.
Storj works better when you use it for parallel transfers - basically you can saturate almost any connection, if you would transfers dozens objects in parallel, especially big files (where you can also transfer chunks of each file in parallel too).
Sure it is not only one physical endpoint.
Backblaze also has more than one endpoint.
That does not make it decentralized.
We don’t say Backblaze is decentralized, because they have multiple endpoints.
So it is slower than the competition on almost any location? Or is Europe the bad outliner?
What is your traceroute or mtr output to gateway.storjshare.io? Example commands:
traceroute gateway.storjshare.io or mtr --report-wide gateway.storjshare.io (if you have mtr installed)
What Storj satellite do you use for your location?
We have S3 gateway locations in a number of locations across the world now, and we’re always looking to fix any outliers in BGP anycast routes that look strange.
Note that even the big players can get this wrong sometimes, there’s no such thing as perfect peering for everyone, it’s highly dependent on your ISP as well.
Where do you see, that I said “decentralized” for the S3 Gateway?
sometimes Germany is suffer because of Deutsche Telekom / Germany (and all others who uses their services), transit providers often conflicting with each other. So, yes, it’s possible.
Depends on your location and the used protocol (and sometimes hardware). With Storj native - it will be as fast as your bandwidth channel will allow and would be likely the same in any location (with accounting of 2.68x for upload and up to 39/29 ~ 1.34x for download - the libuplink will open 39 connections to download a segment of the file and will cancel all remained when the first 29 are downloaded).
Also depends on what’s do you use, a Storj Global, Storj Select or Storj Private Cloud (SPC) and the protocol, of course. For Global Storj you will get a fastest speed for your location using S3 integration and fastest speed for your location and hardware for Storj native.
if that was not intended on your part, consider my comment on it as a small note.
Telekom is a joke, but it is slow even for other ISPs with great peering to DE-CIX.
That is moving the goalpost. I was specifically talking about S3. I even said that native is faster, but we are talking about S3 here not native. Please stop comparing apples to oranges.
OP @Slewrate did an S3 benchmark.
Can’t we just be honest and say:
“Yes, STORJ S3 is slower than the competition, but there is a faster alternative if you use native integration”?
Simple as that. Then the user can decide, if native integration is even a possibility for his/her usecase.
As far as I can see, I used the word “distributed” with the inclusion “too”, which means that our S3 Gateway is distributed too, as Storage Nodes.
See the point?
Just NOT SO DISTRIBUTED as storagenodes (native).
yeah… but they are a biggest one… troubles…
ok. For S3 it’s still distributed, but could be affected by peering between those big players, you know…
For some cases. As I said - “depends”, on your channel, hardware and location. This is true for a native too, by the way, but no so heavy.
But that leads to nowhere, we can split hears about the meaning of decentralized an distributed all night long.
My main point is this:
Your communication comes off weirdly defensive and quite dishonest. If a user has performance problems, this forum always tries to blame the user. Or we simply will burry him with links to a wall of text. Because you are so not upfront in your communication, during that read, the user finds out, that the text is not about S3 at all. What if the user has to use S3? There are multiple situations I can think of where native is not an option.
So why is it so hard to be honest and say:
“Yes, STORJ S3 is slower than the competition, but there is a faster alternative if you are able to use native integration”
Actually - both. That’s the point. Yes, we have much less instances of S3 gateways than nodes… but every gateway connects to 110 nodes around the world for each segment of each customer. It’s still distributed, even if you choose a closest S3 gateway for your location. That’s the huge difference - you always uploads to the nodes, not to the central server (even if you report to the central service, which is… surprise - is distributed too) and downloads from the nodes. But yes - through the closest S3 Gateway instance (actually if you close enough to several instances or moving or the routing is changed - the parallel transfers could use a separate gateway instances).
You are simple wrong.
This information ^ can help with the
And also, if they are paying customer, they also have a possibility to use a Storj Select or Storj Private Cloud, if the Storj Global is not the option for any reason.
For your location. That’s the main and complete point.
If the customer can share their location, and if they are the paying customer - the team can try to fix the peering issue (if it’s it), and also - to help the customer to configure software for their case.
We want a dialogue, not the blaming, as you believe.
Looking at them, it appears some ISPs in Europe using Tata (AS6453) as a transit provider are sending traffic to our servers in New York, instead of Frankfurt. We should have a direct connection to Tata on the Frankfurt server, but it doesn’t appear to be working. We’re looking into that now.
The Amazon trace going from US West to Frankfurt is a known issue, which we are investigating as well.