First, let me preface this that it may not be a Storj issue at all, however I’m a bit unsure how to debug this.
One of the users of my app is reporting very slow file uploads. They’re on a 500/500 connection so uploads should be very quick, however, they’re reporting speeds which are about twice as long as what it’s taking my 50/20 connection to make.
I asked the user for traceroute to the gateway and they’ve come back with:
1 <1 ms <1 ms <1 ms 2a02:6b67:64a3:0:daec:5eff:fe7d:c9f
2 2 ms 2 ms 2 ms 2a02:6b60:0:1
3 2 ms 2 ms * lag-7.acc1.thu.lon.network.as201838.net [2a02:6b60:0:1:4::13]
4 3 ms 2 ms 2 ms xge-0-2-1.acc1.ld8.lon.network.as201838.net [2a02:6b60:0:1:1::72]
5 2 ms 2 ms * lag-4.agg-rr1.avo.lon.cisco.network.as201838.net [2a02:6b60:0:1:1::229]
6 * * * Request timed out.
7 * * * Request timed out.
8 * * * Request timed out.
9 69 ms 69 ms 69 ms 100ge4-1.core1.nyc4.he.net [2001:470:0:2cf::2]
10 * * * Request timed out.
11 78 ms 79 ms 79 ms storj-labs-inc.e0-6.switch1.tor3.he.net [2001:470:1:209::2]
12 78 ms 79 ms 79 ms 2a10:c640:0:1::4
You can see a few dropouts here. They’re based in London and the ISP is CommunityFibre. They mentioned the traceroute taking a long time to run.
I obtained two other traceroutes from users in London, this one on Sky
traceroute: Warning: gateway.storjshare.io has multiple addresses; using 220.127.116.11
traceroute to gateway.storjshare.io (18.104.22.168), 64 hops max, 52 byte packets
1 skyrouter (192.168.0.1) 5.668 ms 5.074 ms 27.939 ms
2 * * *
3 02780b94.bb.sky.com (22.214.171.124) 408.165 ms 13.441 ms 11.024 ms
4 126.96.36.199 (188.8.131.52) 9.614 ms 13.388 ms *
5 e0-36.cr2.lon2.gb.unitasglobal.net (184.108.40.206) 83.266 ms 10.884 ms 9.321 ms
6 e0-15.cr1.fra1.de.unitasglobal.net (220.127.116.11) 22.445 ms 21.415 ms *
7 ae1-3.cr2.fra1.de.unitasglobal.net (18.104.22.168) 21.113 ms 20.702 ms 20.836 ms
8 * * *
9 * * *
10 * * *
11 * * *
12 * * *
and this one on plusnet (BT)
traceroute: Warning: gateway.storjshare.io has multiple addresses; using 22.214.171.124
traceroute to gateway.storjshare.io (126.96.36.199), 64 hops max, 52 byte packets
1 dsldevice (192.168.1.254) 2.478 ms 4.105 ms 2.087 ms
2 * * *
3 * * *
4 140.hiper04.sheff.dial.plus.net.uk (188.8.131.52) 7.622 ms
136.hiper04.sheff.dial.plus.net.uk (184.108.40.206) 7.633 ms 7.355 ms
5 peer8-et-7-0-5.telehouse.ukcore.bt.net (220.127.116.11) 7.028 ms
peer8-et-7-0-2.telehouse.ukcore.bt.net (18.104.22.168) 7.707 ms
peer8-et-0-1-7.telehouse.ukcore.bt.net (22.214.171.124) 7.108 ms
6 126.96.36.199 (188.8.131.52) 7.433 ms 8.063 ms
184.108.40.206 (220.127.116.11) 10.993 ms
7 e0-36.cr2.lon2.gb.unitasglobal.net (18.104.22.168) 7.751 ms 8.988 ms
e0-14.cr2.lon1.gb.unitasglobal.net (22.214.171.124) 9.573 ms
8 * e0-15.cr1.fra1.de.unitasglobal.net (126.96.36.199) 22.514 ms 18.324 ms
9 e0-15.cr1.fra1.de.unitasglobal.net (188.8.131.52) 20.114 ms
ae1-3.cr2.fra1.de.unitasglobal.net (184.108.40.206) 19.693 ms
e0-15.cr1.fra1.de.unitasglobal.net (220.127.116.11) 19.996 ms
10 ae1-3.cr2.fra1.de.unitasglobal.net (18.104.22.168) 20.914 ms
storj.cust.fra1.de.unitasglobal.net (22.214.171.124) 18.722 ms
ae1-3.cr2.fra1.de.unitasglobal.net (126.96.36.199) 20.359 ms
11 storj.cust.fra1.de.unitasglobal.net (188.8.131.52) 20.606 ms 19.747 ms 20.375 ms
12 * * *
13 * * *
The latter two don’t seem to have any timeouts (although that’s not 100% indicative of a problem).
Any ideas on what to try next to figure out why the performance is so bad for them? They don’t have any issues with other sites and play a lot of online games so they’d be sensitive to a poor connection.
Initial thought is IPv6 vs IPv4??
I’m in London and my traceroute is not dissimilar:
That’s interesting, the successful ones seem to be IPV4 based
My traceroute, but I’m based in Melbourne:
1 1 ms 1 ms 1 ms 2404-e80-613e-1-ae84-c6ff-fe31-3e72.dyn6.launtel.net.au [2404:e80:613e:1:ae84:c6ff:fe31:3e72]
2 4 ms 3 ms 3 ms 2404-e80-691d-d2-a55-31ff-fed3-59cf.dyn6.launtel.net.au [2404:e80:691d:d2:a55:31ff:fed3:59cf]
3 3 ms 2 ms 2 ms core02-m1s.launtel.net.au [2404:e80:6000:516::17]
4 3 ms 2 ms 4 ms 2401:3cc0:200:501::
5 3 ms 2 ms 2 ms 2401:3cc0::4:142
6 13 ms 13 ms 13 ms 2401:3cc0::4:68
7 145 ms 146 ms 146 ms 2401:3cc0::4:a8
8 146 ms 146 ms 145 ms 2401:3cc0:1009:2::1
9 146 ms 148 ms 147 ms 2001:438:fffe::795
10 * * * Request timed out.
11 205 ms 204 ms 204 ms 2a10:c640:0:1::4
I’m IPV6 based but have noticed no problems. A real head scratcher this one, I’m not even sure where to be looking.
Wow, thanks for this. I have no real idea to read these traceroutes so these is incredibly helpful, thank you so much.
I know the
* * * responses are the hop not responding but are the “Request timed out” also a signal that something else is amiss? (I notice some
* * * responses don’t have the timeout )
Again, thank you!
Ok, this is very strange.
I just tested uploads to the gateway from a friend of mines connection (in Melbourne) and he’s also seeing very very slow uploads. His connection is 100/50 megabits but he’s seeing probably about 10% of the speed he should with the upload (estimate 1 hour to upload a 1GB file).
I’m kind of at my wits end here about why some connections are seeing great speeds to the gateway and others are seeing terrible speeds.
If people have time, can I ask you to try uploading a dummy file to the following endpoint?
And report the speed?
No, I’m not tracking the speed (yet). Did it complete fairly quickly though (sub ten minutes for 1 gig?)
I can see the files come through, around 100mb each. You should have seen that upload at nearly your max connection speed.
912 MB in time 2m 40s from Poland … wow… pretty fast
data from speedtest.net
Testing download speed........................................
Download: 489.38 Mbit/s
Testing upload speed..................................................
Upload: 62.01 Mbit/s
for comparison the same file on https://transfer.sh/
912 MB in time 11m 35s
My hunch is that this is an IPv6 routing issue
Location : Denmark
File : 1GB-Th3Van-Denmark-002.zip (1.073.741.824 bytes)
Upload time : ~53 sec.
So for context, how this upload system works:
When you upload a file, I split it into chunks based on file size and generate a signed URL to Storj’s gateway
The browser uploads each of the parts (with 3 uploads concurrently running)
As each upload completes, I store the ETAG
After all uploads complete, the browser notifies my server and I instruct the gateway to assemble the files.
So at no point in this process are any uploads traversing my server.
I hope my response finds you doing well. On Friday after we saw your post here on the forum it struck the interest of one of our developers. We had started an internal investigation to see if there was something that we could investigate on our end potentially within gateway-mt. We did a similar investigation as you except by gathering traceroutes from various locations around the globe and saw some instances of intermittent communication issues, mostly in the form of packet loss and ping times.
The timeouts and lengthy requests were occurring outside of the edge services from what we could tell. One of our infrastructure engineers noticed that these issues were occurring on IPv6 routes. I believe there is still some internal investigation going and we will follow up if we find anything that we can make an action item from or if we can verify that it is an external routing issue.
Great! Thanks for taking a look at it
Any updates on this?
I asked a couple of users that are suffering from the slow uploads to the gateway to retest, they’re still getting the issue.
Can you post (or DM me) users’ locations, their traceroutes and the speeds they were experiencing?
While we attributed most of the packet loss problems to some of our uplink providers last week, there may still be an issue of some geographically non-optimal routing. We’re constantly testing our routes from tens of locations around the world.
While we’re officially supporting mainly
gateway.storjshare.io domain, we used to have domains pointing to specific regions, such as
gateway.ap1.storjshare.io. Their implementation differs slightly from the domain we’re endorsing now. Can you test these domains as well? They might be an important data point for our investigation (as well as possibly a temporary workaround).
I’ve hardcoded the gateway to
gateway.ap1.storjshare.io and I’m seeing speed improvements for at least 1 user so far. I’m waiting on the other that had the same issue to retest and will post again if they see an improvement as well.