Downloading from AWS S3 vs Storj S3

hashbackup · November 6, 2021, 12:06am

I’ve been working on optimizing the download performance of HashBackup’s S3 drivers. Multipart uploads are working well and scale well, both on S3 and Storj’s S3 gateway. Multipart downloads are not scaling so great: they seem to have the same performance as single-part downloads. It’s not a VM limitation because Amazon’s aws CLI utility downloads faster and so does Storj’s uplink CLI program.

While investigating that, I’m also trying to figure out why Storj’s S3 download is slower than AWS. I’ve added some debug statements whenever recv() is called to receive data from a socket.

Here is the end of a 128MB Storj S3 download:

key._get_file read= 7408
key._get_file read= 7408
key._get_file read= 7408
key._get_file read= 7408
key._get_file read= 7408
key._get_file read= 7408
key._get_file read= 7408
key._get_file read= 7408
key._get_file read= 7408
key._get_file read= 7408
key._get_file read= 7200
Down 6.9s 18.6 MiB/s Del 624.9ms 204.8 MiB/s

Here’s the end of a 128Mb download from AWS:

key._get_file read= 16384
key._get_file read= 12928
key._get_file read= 9000
key._get_file read= 16384
key._get_file read= 10616
key._get_file read= 16384
key._get_file read= 16384
key._get_file read= 16384
key._get_file read= 13848
key._get_file read= 10102
Down 1.4s 88.7 MiB/s Del 34.9ms 3.6 GiB/s

I’m curious why recv() always gets more than twice as much data from S3 per call vs Storj. Is this some kind of network configuration difference that maybe can be bumped up on Storj’s gateway, like jumbo frames?

Update: after working on it for 3 days, I still haven’t figured out why HashBackup can’t scale S3 downloads. Uploads scale fine, but downloads are stuck at 20 MB/s for Storj and 90 MB/s for S3. I have managed to make downloads use 25% less CPU, but that apparently isn’t the bottleneck.

The good news is, it’s some software limitation in HashBackup I haven’t found yet, because if I use the aws CLI to download files from the S3MTGW and set the chunk size to 64M, it does scale (this is on a 6 CPU VM):

[root@hbsj ~]# /usr/bin/time -v aws --profile=sj --endpoint-url=https://gateway.us1.storjshare.io s3 cp s3://hbtest/hbtest/arc.1.0 .
Completed 212.8 MiB/955.9 MiB (88.4 MiB/s) with 1 file(s) remaining 
Completed 509.0 MiB/955.9 MiB (126.3 MiB/s) with 1 file(s) remaining
Completed 705.8 MiB/955.9 MiB (122.4 MiB/s) with 1 file(s) remaining
Completed 863.5 MiB/955.9 MiB (113.1 MiB/s) with 1 file(s) remaining
download: s3://hbtest/hbtest/arc.1.0 to ./arc.1.0                   
	Command being timed: "aws --profile=sj --endpoint-url=https://gateway.us1.storjshare.io s3 cp s3://hbtest/hbtest/arc.1.0 ."
	User time (seconds): 5.21
	System time (seconds): 5.01
	Percent of CPU this job got: 99%

Update 2: I finally found the reason downloads weren’t scaling, and it’s a little embarrassing. In normal operation, HashBackup stores remote file sizes in a small database, to avoid remote requests. For the benchmark command, the database is bypassed, and for downloads, the file size before download was always 1. In its infinite wisdom, HB was not using multipart transfers at all for downloads. It was a 1-line change. Much better results now with 16 threads:

[root@hbsj ~]# hb dest -c hb test -s 64m 128m 256m 512m 1g
HashBackup #2598 Copyright 2009-2021 HashBackup, LLC
Using destinations in dest.conf
Warning: destination is disabled: sj
Warning: destination is disabled: s3
Warning: destination is disabled: b2
Warning: destination is disabled: gs

2021-11-08 00:37:09 ---------- Testing sjs3 ----------

  64 MiB:
    Round 1  Up   4.5s   14.2 MiB/s   Down   2.8s   23.0 MiB/s   Del 820.2ms  78.0 MiB/s  
    Round 2  Up   2.9s   22.0 MiB/s   Down   3.3s   19.5 MiB/s   Del 914.7ms  70.0 MiB/s  
    Round 3  Up   4.4s   14.5 MiB/s   Down   3.3s   19.3 MiB/s   Del 907.7ms  70.5 MiB/s  
  > Average  Up   4.0s   16.2 MiB/s   Down   3.1s   20.5 MiB/s   Del 880.9ms  72.7 MiB/s  

  128 MiB:
    Round 1  Up   4.0s   32.2 MiB/s   Down   5.2s   24.8 MiB/s   Del 912.6ms 140.3 MiB/s  
    Round 2  Up   4.5s   28.2 MiB/s   Down   3.7s   35.0 MiB/s   Del 934.9ms 136.9 MiB/s  
    Round 3  Up   3.8s   34.0 MiB/s   Down   4.4s   29.3 MiB/s   Del 712.3ms 179.7 MiB/s  
  > Average  Up   4.1s   31.3 MiB/s   Down   4.4s   29.1 MiB/s   Del 853.3ms 150.0 MiB/s  

  256 MiB:
    Round 1  Up   4.2s   61.0 MiB/s   Down   4.2s   60.6 MiB/s   Del 920.0ms 278.3 MiB/s  
    Round 2  Up   5.1s   49.8 MiB/s   Down   6.3s   40.8 MiB/s   Del  31.5s    8.1 MiB/s  
    Round 3  Up   4.2s   61.6 MiB/s   Down   4.4s   57.8 MiB/s   Del 935.0ms 273.8 MiB/s  
  > Average  Up   4.5s   56.9 MiB/s   Down   5.0s   51.5 MiB/s   Del  11.1s   23.0 MiB/s  

  512 MiB:
    Round 1  Up   6.4s   80.1 MiB/s   Down   4.1s  123.4 MiB/s   Del   1.1s  451.5 MiB/s  
    Round 2  Up   6.3s   81.4 MiB/s   Down   4.3s  118.6 MiB/s   Del   1.3s  383.4 MiB/s  
    Round 3  Up   4.9s  105.3 MiB/s   Down   5.1s  100.8 MiB/s   Del   1.4s  366.8 MiB/s  
  > Average  Up   5.8s   87.6 MiB/s   Down   4.5s  113.4 MiB/s   Del   1.3s  397.4 MiB/s  

  1 GiB:
    Round 1  Up   8.1s  127.1 MiB/s   Down   4.6s  224.3 MiB/s   Del   2.1s  479.2 MiB/s  
    Round 2  Up   7.2s  142.2 MiB/s   Down   4.7s  216.8 MiB/s   Del   2.3s  436.2 MiB/s  
    Round 3  Up   7.9s  130.2 MiB/s   Down   5.6s  183.4 MiB/s   Del   2.1s  480.2 MiB/s  
  > Average  Up   7.7s  132.8 MiB/s   Down   5.0s  206.6 MiB/s   Del   2.2s  464.3 MiB/s  

Test complete

I did some tests with uplink --parallelism 16 downloads on the same setup and got an average of 203 MB/s, so these times through the S3 MT Gateway are comparable. The S3MTGW is probably more manageable for most users since uplink with --parallelism 16 makes around 640 outbound connections at once.

There is still room to scale HashBackup’s uploads further by using processes instead of threads. That was the last change to downloads, before realizing it was something much simpler causing it to not scale.

Footnote: I think the 30-second delete in the 256MB test is a network timeout bug caused by changes I made the last 3 days, so ignore that.

Dominick · November 8, 2021, 4:21pm

Always appreciate you being so verbose with your writeups. >200MB/s enables some pretty good RTOs.

Thanks for all you do!
-Dominick