Gateways, upload speeds

Hello,

We are experimenting with Storj for the last few days, and we need some feedback of the community :slight_smile:

We have huge files to upload to Storj from an Ubuntu server (64GB RAM).

We used rclone with 48 concurrency uploads, 1024M chunk size, but we found the transfer speed was not that high (~320 Mbits/s)

We have about 5 times this bandwidth, so 320 Mbits/s is not a limit on our side.
Also, we tried from several servers (from another network, with even more bandwidth (10 Gbits /s network card) : the download speed was 720 Mbits /s, and upload 495 Mbits /s

Any advice to perform better than this ?
Can this be a Gateway performance problem ? We use https://gateway.eu1.storjshare.io
Can a self hosted gateway improve this stats ? What are the benefits of having a self hosted gateway ?

Thanks for your help.

Your questions are beyond my knowledge but this thread might help - Hotrodding Decentralized Storage

Also @elek and @Dominick have both been involved with forum discussions on gateway performance.

1 Like

Greetings, I’m the author of our performance tuning documentation. I can help with tuning. No need for a self-hosted gateway.

Tuning
1GB files contain 16 64MB segments so you can use a concurrency of 16 with rclone. The command would look as follows.

rclone copy --progress --s3-upload-concurrency 16 --s3-chunk-size 64M 1gb.zip remote:bucket

If 4GB you can crank concurrency to 64.

Notes
Please use a bandwidth monitor like BMON to monitor your network usage. No need to set a concurrency higher than file size divided by 64. Finally please also --s3-chunk-size 64M.

Machine Notes
If uploading from a spinning disk please consider the limitation of such disks which is quite low with many sequential read operations. For best performance upload from a SSD or a NVMe drive.

Best of luck and let us know how we can support you!

-Dominick

4 Likes

Hello,

Many thanks for your answers.

We tried several parameters and the one that provides the best transfer speed is

rclone copy --progress --s3-upload-concurrency 48 --s3-chunk-size 1024M --buffer-size 8192M --transfers 1 file.XX BUCKETOC1:bucketoc1/OC

The files’s size we are uploading really depend, but the biggest we have are 6TB files.
How would you chunk and set concurrency for such files to improve perfs ? (with 64GB RAM)

Is using uplink with parallelism an option to consider for those kind of uploads ?
If yes, any advices ? :slight_smile:

Again, thanks for your help

What’s your upstream Internet bandwidth?

We don’t have a max, we have a guaranteed 1.5Gbits/s (the rest is just bonus :))
On the other server that we used yesterday for test purposes, we had 10Gbits/s guaranteed

Sorry for the spam.

I tried uplink with 8 parallelism, which have better perf (470 Mbits/s upload, with 8 parallelism)
But when I try with 16 parallelism, I have this error

Error: uplink: stream: ecclient: successful puts (0) less than or equal to repair threshold (35); context canceled; context canceled; context canceled; context canceled; context canceled; uplink: stream: context canceled; ecclient: upload cancelled by user; uplink: stream: context canceled; ecclient: upload cancelled by user; uplink: stream: context canceled; ecclient: upload cancelled by user; uplink: stream: context canceled; ecclient: upload cancelled by user; uplink: stream: context canceled; ecclient: upload cancelled by user; uplink: stream: context canceled; ecclient: upload cancelled by user; uplink: stream: context canceled; ecclient: upload cancelled by user; uplink: stream: context canceled; ecclient: upload cancelled by user; uplink: stream: context canceled; ecclient: upload cancelled by user; uplink: stream: context canceled; ecclient: upload cancelled by user

I’ve noticed few links that talk about this and the fact that there are not enough nodes available

We have a question.
Are you getting this error every time that you have tried with 16 parallelisms or only sometimes?

On the other hand, please tell us

  • how many physical CPU cores the machine that you’re using has?
  • Try to use 12 parallelisms and report back if you’re getting the same error and if you’re getting if that’s always or only sometimes.

Thank you for conducting the above tests and use our services.

I tried with several servers.

Those tests are made with a 16 CPU machines :

image

And it crashes with 16, 14 (weird test maybe ? :slight_smile: ), 12 parallelism :

With 12 parallelism, I conduct 5 times the same tests, and it crashes every time (not at the same uploaded size)

The most common cause of crashes uploading with high parallelism via uplink cli is running out of memory.

Please also note that uplink has a 2.68x upload multiplier, thus seeing 470Mb/s is really 1,260Mb/s which is likely the upper bound of your router or internet connection. Please use a network monitoring tool like BMON to monitor your upload speeds.

You will get the best upload speeds using rclone with our hosted gateway. At these speeds and number of connections is it not uncommon to start to discover gating factors such as lower power networking hardware. Please share as much as much information as possible from your environment.

Just for comparison, I did a quick test with uplink on our 88 Cores / 512GB RAM / NVMe hdd Ubuntu server on a 10Gbit WAN and with --parallelism = 16

150 GB file - Upload

root@server030:~# fallocate -l 150Gib 150GiB.file
root@server030:~# time uplink cp --parallelism 16 150GiB.file sj://test/
16:43:01.717    INFO    Configuration loaded
        Location: /root/.local/share/storj/uplink/config.yaml
16:43:01.718    DEBUG   Anonymized tracing disabled
16:43:01.718    DEBUG   debug server listening on 127.0.0.1:36869
150.00 GiB / 150.00 GiB [--------------------------] 100.00% 260.40 MiB p/s
Created sj://test//150GiB.file
16:52:51.796    INFO    skipped metric proc_stat,scope=github.com/spacemonkeygo/monkit/v3/environment Rsslim because admproto: value not representable in float16
16:52:51.797    INFO    skipped metric runtime_memstats,scope=github.com/spacemonkeygo/monkit/v3/environment LastGC because admproto: value not representable in float16

real    9m50,345s
user    227m35,614s
sys     248m9,456s

150 GB file - Download

root@server030:~# time uplink cp --parallelism 16 sj://test/150GiB.file 150GiB.file2
16:56:10.998    INFO    Configuration loaded
        Location: /root/.local/share/storj/uplink/config.yaml
16:56:10.998    DEBUG   Anonymized tracing disabled
16:56:10.998    DEBUG   debug server listening on 127.0.0.1:46281
150.00 GiB / 150.00 GiB [--------------------------] 100.00% 291.18 MiB p/s
Downloaded sj://test/150GiB.file to 150GiB.file2
17:04:58.714    INFO    skipped metric proc_stat,scope=github.com/spacemonkeygo/monkit/v3/environment Rsslim because admproto: value not representable in float16
17:04:58.714    INFO    skipped metric runtime_memstats,scope=github.com/spacemonkeygo/monkit/v3/environment LastGC because admproto: value not representable in float16

real    8m47,856s
user    192m48,276s
sys     455m56,578s

root@server030:~# uplink version
Release build
Version: v1.44.1
Build timestamp: 30 Nov 21 21:38 CET
Git commit: bae8276f3f59951bfe01498b6e49b9037309b36c

Th3Van.dk

2 Likes

You must have a magic connection, cause it looks more like cache then internet speeds… No matter what I do I cannot maintain a steady 100MiB/s when I upload.

Thanks @Th3Van for the tests, I would love to achieve those perfs :slight_smile:
Are you using the EU gateway ?
Do you have specific parameters in uplink ?

@Dominick

The most common cause of crashes uploading with high parallelism via uplink cli is running out of memory.

I checked, and we never went over the server perf neither approached high values.
Even if uplink is performing better, we cannot trust it yet because it crashes too often.

I tested your command Dominick. On a 20Gib file

rclone copy --progress --s3-upload-concurrency 16 --s3-chunk-size 64M 20gb.zip remote:bucket

I also tested several others parameters and several servers, here are the results :

SERVER 1 (16 CPU, 128Go RAM, NVME disks) :

Rclone 96 / 64M = 343Mbits/s
Rclone 96 / 512M = 350Mbits/s
Rclone 96 / 1024M = 361Mbits/s
Rclone 96 / 2048M = 355Mbits/s
Rclone 96 / 2048M US GATEWAY = Too slow, stopped it
Rclone no-concurrency / no-chunk = Too slow, stopped it

SERVER 2 (32 CPU, 64Go RAM, HDD) :

Rclone 48 / 1024M = 285Mbits/s
Stopped the test there

SERVER 3 (32 CPU, 120Go RAM, 10Gbit WAN, SSD) :

Rclone 48 / 1024M = 434Mbits/s
Rclone 96 / 512M = 463Mbits/s
Rclone 96 / 1024M = 455Mbits/s
Rclone 1536 / 64M = 474Mbits/s

PS : I know some tests don’t really make sense because with a 20GiB file, you cannot have more concurrency that the number of chunks, but just wanted to test :slight_smile:

I still don’t know where the problem can be :frowning:
Thanks for you advices and help

Regarding Th3Van’s performance, looks great. They are using Uplink so there is no use of GatewayMT. Uplink is always direct to nodes (native).

Regarding uplink, its very stable and works great. We should do a support session and figure out what is going on.

Let’s evaluate your environment ODCDockers, below are a few questions.

  • What is the cpu load of your router?
  • Is this on a VPS provider? If so who?
  • Is QOS being utilized in your environment?
  • I see you are using rclone. Are you using the Native “Tardigrade” integration or the generic S3 integration with our hosted gateway?

Given the large number of connections used when pushing performance, this can be network intensive.

FInally I can offer a tuning session next week to dig into your configuration and get you the best possible speeds. Please advise if you would like m to setup a meeting.

3 Likes

I found the problem with uplink.

In a screen in this thread, it said “too many open files …”

So I checked the default value :

ulimit -n
= 1024

After upgrading this to a higher value, no problem with parallelism = 32.

Tonight, we achieve 580 Mbits/s, which is our best so far.
The test was made at 21:00, and at this time we have a lot of network trafic, so tomorrow we should see even more perf.

@Dominick,

  • What is the cpu load of your router?
    Almost none

  • Is QOS being utilized in your environment?
    No

  • I see you are using rclone. Are you using the Native “Tardigrade” integration or the generic S3 integration with our hosted gateway?
    Generic S3.
    I just tried Tardigrade, and perf was 166Mbits/s (for 20GiB file)

Thanks for your help.
I’ll keep you posted tomorrow with more tests.

4 Likes