Routing issues to gateway.storjshare.io

First, let me preface this that it may not be a Storj issue at all, however I’m a bit unsure how to debug this.

One of the users of my app is reporting very slow file uploads. They’re on a 500/500 connection so uploads should be very quick, however, they’re reporting speeds which are about twice as long as what it’s taking my 50/20 connection to make.

I asked the user for traceroute to the gateway and they’ve come back with:

  1    <1 ms    <1 ms    <1 ms  2a02:6b67:64a3:0:daec:5eff:fe7d:c9f
  2     2 ms     2 ms     2 ms  2a02:6b60:0:1
  3     2 ms     2 ms     *     lag-7.acc1.thu.lon.network.as201838.net [2a02:6b60:0:1:4::13]
  4     3 ms     2 ms     2 ms  xge-0-2-1.acc1.ld8.lon.network.as201838.net [2a02:6b60:0:1:1::72]
  5     2 ms     2 ms     *     lag-4.agg-rr1.avo.lon.cisco.network.as201838.net [2a02:6b60:0:1:1::229]
  6     *        *        *     Request timed out.
  7     *        *        *     Request timed out.
  8     *        *        *     Request timed out.
  9    69 ms    69 ms    69 ms  100ge4-1.core1.nyc4.he.net [2001:470:0:2cf::2]
 10     *        *        *     Request timed out.
 11    78 ms    79 ms    79 ms  storj-labs-inc.e0-6.switch1.tor3.he.net [2001:470:1:209::2]
 12    78 ms    79 ms    79 ms  2a10:c640:0:1::4

You can see a few dropouts here. They’re based in London and the ISP is CommunityFibre. They mentioned the traceroute taking a long time to run.

I obtained two other traceroutes from users in London, this one on Sky

traceroute: Warning: gateway.storjshare.io has multiple addresses; using 185.244.226.3
traceroute to gateway.storjshare.io (185.244.226.3), 64 hops max, 52 byte packets
 1 skyrouter (192.168.0.1) 5.668 ms 5.074 ms 27.939 ms
 2 * * *
 3 02780b94.bb.sky.com (2.120.11.148) 408.165 ms 13.441 ms 11.024 ms
 4 195.66.239.188 (195.66.239.188) 9.614 ms 13.388 ms *
 5 e0-36.cr2.lon2.gb.unitasglobal.net (50.115.91.37) 83.266 ms 10.884 ms 9.321 ms
 6 e0-15.cr1.fra1.de.unitasglobal.net (50.115.91.200) 22.445 ms 21.415 ms *
 7 ae1-3.cr2.fra1.de.unitasglobal.net (50.115.90.105) 21.113 ms 20.702 ms 20.836 ms
 8 * * *
 9 * * *
10 * * *
11 * * *
12 * * *

and this one on plusnet (BT)

traceroute: Warning: gateway.storjshare.io has multiple addresses; using 185.244.226.4
traceroute to gateway.storjshare.io (185.244.226.4), 64 hops max, 52 byte packets
 1  dsldevice (192.168.1.254)  2.478 ms  4.105 ms  2.087 ms
 2  * * *
 3  * * *
 4  140.hiper04.sheff.dial.plus.net.uk (195.166.143.140)  7.622 ms
    136.hiper04.sheff.dial.plus.net.uk (195.166.143.136)  7.633 ms  7.355 ms
 5  peer8-et-7-0-5.telehouse.ukcore.bt.net (62.172.103.178)  7.028 ms
    peer8-et-7-0-2.telehouse.ukcore.bt.net (109.159.252.100)  7.707 ms
    peer8-et-0-1-7.telehouse.ukcore.bt.net (194.72.16.146)  7.108 ms
 6  195.66.239.188 (195.66.239.188)  7.433 ms  8.063 ms
    195.66.224.81 (195.66.224.81)  10.993 ms
 7  e0-36.cr2.lon2.gb.unitasglobal.net (50.115.91.37)  7.751 ms  8.988 ms
    e0-14.cr2.lon1.gb.unitasglobal.net (50.115.90.161)  9.573 ms
 8  * e0-15.cr1.fra1.de.unitasglobal.net (50.115.91.200)  22.514 ms  18.324 ms
 9  e0-15.cr1.fra1.de.unitasglobal.net (50.115.91.200)  20.114 ms
    ae1-3.cr2.fra1.de.unitasglobal.net (50.115.90.105)  19.693 ms
    e0-15.cr1.fra1.de.unitasglobal.net (50.115.91.200)  19.996 ms
10  ae1-3.cr2.fra1.de.unitasglobal.net (50.115.90.105)  20.914 ms
    storj.cust.fra1.de.unitasglobal.net (45.15.192.34)  18.722 ms
    ae1-3.cr2.fra1.de.unitasglobal.net (50.115.90.105)  20.359 ms
11  storj.cust.fra1.de.unitasglobal.net (45.15.192.34)  20.606 ms  19.747 ms  20.375 ms
12  * * *
13  * * *

The latter two don’t seem to have any timeouts (although that’s not 100% indicative of a problem).

Any ideas on what to try next to figure out why the performance is so bad for them? They don’t have any issues with other sites and play a lot of online games so they’d be sensitive to a poor connection.

Initial thought is IPv6 vs IPv4??

I’m in London and my traceroute is not dissimilar:
image

That’s interesting, the successful ones seem to be IPV4 based :thinking:

My traceroute, but I’m based in Melbourne:

  1     1 ms     1 ms     1 ms  2404-e80-613e-1-ae84-c6ff-fe31-3e72.dyn6.launtel.net.au [2404:e80:613e:1:ae84:c6ff:fe31:3e72]
  2     4 ms     3 ms     3 ms  2404-e80-691d-d2-a55-31ff-fed3-59cf.dyn6.launtel.net.au [2404:e80:691d:d2:a55:31ff:fed3:59cf]
  3     3 ms     2 ms     2 ms  core02-m1s.launtel.net.au [2404:e80:6000:516::17]
  4     3 ms     2 ms     4 ms  2401:3cc0:200:501::
  5     3 ms     2 ms     2 ms  2401:3cc0::4:142
  6    13 ms    13 ms    13 ms  2401:3cc0::4:68
  7   145 ms   146 ms   146 ms  2401:3cc0::4:a8
  8   146 ms   146 ms   145 ms  2401:3cc0:1009:2::1
  9   146 ms   148 ms   147 ms  2001:438:fffe::795
 10     *        *        *     Request timed out.
 11   205 ms   204 ms   204 ms  2a10:c640:0:1::4

I’m IPV6 based but have noticed no problems. A real head scratcher this one, I’m not even sure where to be looking.

Wow, thanks for this. I have no real idea to read these traceroutes so these is incredibly helpful, thank you so much.

I know the * * * responses are the hop not responding but are the “Request timed out” also a signal that something else is amiss? (I notice some * * * responses don’t have the timeout :man_shrugging:)

Again, thank you!

Ok, this is very strange.

I just tested uploads to the gateway from a friend of mines connection (in Melbourne) and he’s also seeing very very slow uploads. His connection is 100/50 megabits but he’s seeing probably about 10% of the speed he should with the upload (estimate 1 hour to upload a 1GB file).

I’m kind of at my wits end here about why some connections are seeing great speeds to the gateway and others are seeing terrible speeds.

If people have time, can I ask you to try uploading a dummy file to the following endpoint?

And report the speed?

  • there is no speed summary…

podsumowanie

No, I’m not tracking the speed (yet). Did it complete fairly quickly though (sub ten minutes for 1 gig?)

only 113 MB in time 24s

I can see the files come through, around 100mb each. You should have seen that upload at nearly your max connection speed.

912 MB in time 2m 40s from Poland … wow… pretty fast :slight_smile:

data from speedtest.net

Testing download speed........................................
Download: 489.38 Mbit/s
Testing upload speed..................................................
Upload: 62.01 Mbit/s

for comparison the same file on https://transfer.sh/

912 MB in time 11m 35s

My hunch is that this is an IPv6 routing issue :thinking:

Location : Denmark

File : 1GB-Th3Van-Denmark-002.zip (1.073.741.824 bytes)

Upload time : ~53 sec.

// Th3Van.dk

So for context, how this upload system works:

  1. When you upload a file, I split it into chunks based on file size and generate a signed URL to Storj’s gateway

  2. The browser uploads each of the parts (with 3 uploads concurrently running)

  3. As each upload completes, I store the ETAG

  4. After all uploads complete, the browser notifies my server and I instruct the gateway to assemble the files.

So at no point in this process are any uploads traversing my server.

Rodeoclash,

I hope my response finds you doing well. On Friday after we saw your post here on the forum it struck the interest of one of our developers. We had started an internal investigation to see if there was something that we could investigate on our end potentially within gateway-mt. We did a similar investigation as you except by gathering traceroutes from various locations around the globe and saw some instances of intermittent communication issues, mostly in the form of packet loss and ping times.

The timeouts and lengthy requests were occurring outside of the edge services from what we could tell. One of our infrastructure engineers noticed that these issues were occurring on IPv6 routes. I believe there is still some internal investigation going and we will follow up if we find anything that we can make an action item from or if we can verify that it is an external routing issue.

Code_Breaker

7 Likes

Great! Thanks for taking a look at it :slightly_smiling_face:

Any updates on this?

I asked a couple of users that are suffering from the slow uploads to the gateway to retest, they’re still getting the issue.

@Rodeoclash
Can you post (or DM me) users’ locations, their traceroutes and the speeds they were experiencing?

While we attributed most of the packet loss problems to some of our uplink providers last week, there may still be an issue of some geographically non-optimal routing. We’re constantly testing our routes from tens of locations around the world.

While we’re officially supporting mainly gateway.storjshare.io domain, we used to have domains pointing to specific regions, such as gateway.eu1.storjshare.io, gateway.us1.storjshare.io, and gateway.ap1.storjshare.io. Their implementation differs slightly from the domain we’re endorsing now. Can you test these domains as well? They might be an important data point for our investigation (as well as possibly a temporary workaround).

1 Like

I’ve hardcoded the gateway to gateway.ap1.storjshare.io and I’m seeing speed improvements for at least 1 user so far. I’m waiting on the other that had the same issue to retest and will post again if they see an improvement as well.