Upload refactoring

littleskunk · May 4, 2023, 12:27pm

We have a new uplink binary and a new gateway-mt. The main difference is a faster upload, especially for bigger files. The old way to upload a file was based on segments. It had 2 big disadvantages. If too many piece transfers fail the entire segment will fail. Retry uploading was only possible for the entire segment. The second problem was performance. It was uploading one segment at a time and waiting for the segment to finish before starting the next segment. It was possible to upload multiple segments at the same time but that was also increasing the number of connections and might cause other problems.

The new code has a retry on the piece level. If too many piece transfers fail the new code will just retry uploading these pieces and still finish the transfer. Instead of concurrent segment transfers, we now have concurrent piece transfers. This also means it can be scaled down to the equivalent of 0.5 concurrent segments or lower. Even with such a low concurrency, there will be no pause between 2 segments. So overall a much more stable and faster upload.

A new uplink binary is available here: Release v1.78.2 · storj/storj · GitHub
Please let us know if you have any problems with it.

Today we are also enabling the new upload code in gateway-mt. Note: For better performance, a multipart size of more than 64MB would be best like for example 1GB.

Up next:
Libuplink will be switched over to the new code. That will enable tools like rclone to benefit from it.
Also on the roadmap is a similar refactoring for downloads.

BrightSilence · May 4, 2023, 10:46pm

This is great news and will definitely help performance for bigger file transfers. I’m wondering, could multiple files be treated the same way in native connections? So, starting uploads for the next file when the first piece transfer of the previous file is finished. I have always been a little confused about why concurrency and parallelism are two separate options. Especially with this new way of doing things, having concurrency work across files would be much more efficient

Th3Van · May 5, 2023, 12:54am

Created a 25GB file to test upload speed using --parallelism 10, and it looks like the new version are a bit slower than the old uplink version…

root@server030:/disk103/tmp/uplink-test# wget -q https://github.com/storj/storj/releases/download/v1.77.2/uplink_linux_amd64.zip ; unzip -q uplink_linux_amd64.zip; mv uplink uplink-1772; rm uplink_linux_amd64.zip
root@server030:/disk103/tmp/uplink-test# wget -q https://github.com/storj/storj/releases/download/v1.78.2/uplink_linux_amd64.zip ; unzip -q uplink_linux_amd64.zip; mv uplink uplink-1782; rm uplink_linux_amd64.zip

root@server030:/disk103/tmp/uplink-test# ./uplink-1772 version | head -n 4
Release build
Version:            v1.77.2
Build timestamp:    18 Apr 23 17:31 CEST
Git commit:         d69563b37dcf77de9fb2ae98d8ff04573803212b

root@server030:/disk103/tmp/uplink-test# ./uplink-1782 version | head -n 4
Release build
Version:            v1.78.2
Build timestamp:    03 May 23 10:49 CEST
Git commit:         0981f54533849b6615e9c5198bc46a3698a5b260

root@server030:/disk103/tmp/uplink-test# time ./uplink-1772 cp --parallelism 10 ./25g-file1 sj://test/25g-file1
upload 25g-file1 to sj://test/25g-file1
25g-file1  25.00 GB / 25.00 GB [=======================================] 100.00% 54.74 MiB/s

real    1m31,378s
user    44m33,107s
sys     50m4,431s


root@server030:/disk103/tmp/uplink-test#./uplink-1772 rm sj://test/25g-file1
removed sj://test/25g-file1 file


root@server030:/disk103/tmp/uplink-test# time ./uplink-1782 cp --parallelism 10 ./25g-file1 sj://test/25g-file1
upload 25g-file1 to sj://test/25g-file1
25g-file1  25.00 GB / 25.00 GB [=======================================] 100.00% 94.89 MiB/s

real    36m32,495s
user    59m23,546s
sys     42m48,747s

If I upload using uplink v1.77.2 on a singe file without the --parallelism 10 option :

root@server030:/disk103/tmp/uplink-test# ./uplink-1772 rm sj://test/25g-file1
removed sj://test/25g-file1

root@server030:/disk103/tmp/uplink-test# time ./uplink-1772 cp ./25g-file1 sj://test/ 
upload 25g-file1 to sj://test/25g-file1
25g-file1  25.00 GB / 25.00 GB [=======================================] 100.00% 156.41 MiB/s

real    12m2,139s
user    39m28,979s
sys     44m17,280s

Am I missing some new uplink option here ?

Th3Van.dk

Alexey · May 5, 2023, 3:14am

$ time ./uplink-1782 cp --access us1-demo 1G.raw sj://test/
upload 1G.raw to sj://test/1G.raw
1.00 GiB / 1.00 GiB [-----------------------------------------------------------------------------] 100.00% 5.61 MiB p/s

real    3m3.903s
user    1m6.939s
sys     0m54.575s

and the default behavior is the same as per segment

$ uplink version
Release build
Version:            v1.66.1
Build timestamp:    31 Oct 22 18:47 +07
Git commit:         67a9c1135a8c071fa88540446e9c1c3297869b28

PATH                      VERSION
storj.io/storj            (devel)
storj.io/common           v0.0.0-20221024150824-a2a5c611dacf
storj.io/drpc             v0.0.32
storj.io/monkit-jaeger    v0.0.0-20220915074555-d100d7589f41
storj.io/private          v0.0.0-20221011183246-586e5f48357a
storj.io/uplink           v1.9.1-0.20221028140107-e37234d89ffd

$ time uplink cp --access us1-demo 1G.raw sj://test/
upload 1G.raw to sj://test/1G.raw
1.00 GiB / 1.00 GiB [-----------------------------------------------------------------------------] 100.00% 5.30 MiB p/s

real    3m14.532s
user    1m5.302s
sys     0m53.422s

Parallelism 10

$ time ./uplink-1782 cp --access us1-demo --parallelism 10 1G.raw sj://test/
upload 1G.raw to sj://test/1G.raw
1.00 GiB / 1.00 GiB [-----------------------------------------------------------------------------] 100.00% 5.21 MiB p/s

real    3m18.199s
user    1m9.695s
sys     1m3.133s

$ time uplink cp --access us1-demo --parallelism 10 1G.raw sj://test/
upload 1G.raw to sj://test/1G.raw
1.00 GiB / 1.00 GiB [-----------------------------------------------------------------------------] 100.00% 5.49 MiB p/s

real    3m7.964s
user    1m2.323s
sys     0m50.978s

so, no difference. Maybe a new version a little bit slower.

Th3Van · May 5, 2023, 4:17am

Looks like one needs a fast internet pipe to actually see the speed differences between the old and the new version of uplink.

This is tested on a 10Gbit/s pipe.

root@server030:/disk103/tmp/uplink-test#openssl rand 1000000000 > 1g-file1

root@server030:/disk103/tmp/uplink-test# time ./uplink-1772 cp 1g-file1 sj://test/1g-file1
upload 1g-file1 to sj://test/1g-file1
1g-file1  1.00 GB / 1.00 GB [=======================================] 100.00% 122.36 MiB/s

real    0m27,025s
user    1m37,221s
sys     2m17,480s

root@server030:/disk103/tmp/uplink-test# ./uplink-1772 rm sj://test/1g-file1
removed sj://test/1g-file1

root@server030:/disk103/tmp/uplink-test# time ./uplink-1782 cp 1g-file1 sj://test
upload 1g-file1 to sj://test/1g-file1
1g-file1  1.00 GB / 1.00 GB [=======================================] 100.00% 39.63 MiB/s

real    2m4,481s
user    2m58,373s
sys     2m26,370s

Also, it looks like the --parallelism option has been removed from uplink now.

root@server030:/disk103/tmp/uplink-test# ./uplink-1782 cp -h --advanced   
Usage:
    uplink cp [--access string] [--recursive] [--transfers int] [--dry-run] [--progress] [--range string] [--maximum-concurrent-pieces int] [--long-tail-margin int] [--inmemory-erasure-coding] [--expires relative_date] [--metadata string] [locations ...]

    Copies files or objects into or out of storj

Arguments:
    locations    Locations to copy (at least one source and one destination). Use - for standard input/output

Flags:
        --access string                    Access name or value to use
    -r, --recursive                        Peform a recursive copy
    -t, --transfers int                    Controls how many uploads/downloads to perform in parallel (default 1)
        --dry-run                          Print what operations would happen but don't execute them
        --progress                         Show a progress bar when possible (default true)
        --range string                     Downloads the specified range bytes of an object. For more information about the HTTP Range header, see https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35
        --maximum-concurrent-pieces int    Maximum concurrent pieces to upload at once per transfer (default 300)
        --long-tail-margin int             How many extra pieces to upload and cancel per segment (default 50)
        --inmemory-erasure-coding          Keep erasure-coded pieces in-memory instead of writing them on the disk during upload
        --expires relative_date            Schedule removal after this time (e.g. '+2h', 'now', '2020-01-02T15:04:05Z0700')
        --metadata string                  optional metadata for the object. Please use a single level JSON object of string to string only

Th3Van.dk

littleskunk · May 5, 2023, 7:56am

Parallelism 10 would be 10 * 110 piece transfers. The default for the new uplink is just 300 piece transfers. Thats an unfair comparison. I would expect the new uplink with --maximum-concurrent-pieces 1100 to be at least the same speed.

There also seems to be --long-tail-margin that might have an impact. For my limited internet connection I would go with a lower value to reduce the overhead. I haven’t tested that yet.

Th3Van · May 5, 2023, 4:05pm

I’ve put together a little script to test upload using a MCP (maximum-concurrent-pieces) range from 100 → 5000, and combining it with a LTM (long-tail-margin) range of 25 → 300 to see how much upload time differ on uploading a 1G file.

It’s currently running, and the output can be followed here : http://www.th3van.dk/benchmark/uplink/time.log (you have to refresh the page manually)

Let me know if I should try other file sizes or/and other MCP/LTM ranges.

Th3Van.dk

Alexey · May 6, 2023, 4:09am

Looks like the best options for 21 seconds are:

maximum-concurrent-pieces	long-tail-margin	upload-in-seconds
350	150	21
450	250	21
2550	125	21
4000	200	21
4250	150	21
4300	50	21

Th3Van · May 6, 2023, 7:51am

Thanks @Alexey

And if we compare that with the uploading time using --parallelism on uplink version 1.77.2 for the same 1GB file :

parallelism	upload-in-seconds
1	28
2	15
3	10
4	8
5	6
6	7
7	6
8	5
9	6
10	5

Output from the test script using v1.77.2 : http://www.th3van.dk/benchmark/uplink/time-1772.log

Th3Van.dk

Alexey · May 6, 2023, 8:12am

So, I would assume that the new version of uplink is worse regarding the upload speed.

Th3Van · May 6, 2023, 9:35am

Testing with the same 1G file and using all “21 seconds” parameters (MCP/LTM), the result is now higer upload times than before :

Starting upload of 1GB file using uplink v1.77.2 @ 2023-05-06T10:55:51 CEST

- Command : ./uplink-1772 cp -p 10 1g-file1 sj://test/1g-file1-1683363351939651
- Upload time (seconds) : 5

Finished upload of 1GB file using uplink v1.77.2 @ 2023-05-06T10:55:57 CEST
------
Starting upload of 1GB files using uplink v1.78.2 @ 2023-05-06T10:55:57 CEST

- Command : ./uplink-1782 cp --maximum-concurrent-pieces 350 --long-tail-margin 150 1g-file1 sj://test/1g-file1-1683363357850288
- Upload time (seconds) : 94

- Command : ./uplink-1782 cp --maximum-concurrent-pieces 450 --long-tail-margin 250 1g-file1 sj://test/1g-file1-1683363452665593
- Upload time (seconds) : 80

- Command : ./uplink-1782 cp --maximum-concurrent-pieces 2550 --long-tail-margin 125 1g-file1 sj://test/1g-file1-1683363533339900
- Upload time (seconds) : 68

- Command : ./uplink-1782 cp --maximum-concurrent-pieces 4000 --long-tail-margin 200 1g-file1 sj://test/1g-file1-1683363601588088
- Upload time (seconds) : 75

- Command : ./uplink-1782 cp --maximum-concurrent-pieces 4250 --long-tail-margin 150 1g-file1 sj://test/1g-file1-1683363677493691
- Upload time (seconds) : 62

- Command : ./uplink-1782 cp --maximum-concurrent-pieces 4300 --long-tail-margin 50 1g-file1 sj://test/1g-file1-1683363739572397
- Upload time (seconds) : 72

Finished upload of 1GB files using uplink v1.78.2 @ 2023-05-06T11:03:31 CEST

And just to be sure, I’m currently uploading a 1TB file using uplink v1.77.2 and later v1.78.2 using the --maximum-concurrent-pieces 2550 and --long-tail-margin 125 parameters.

You can follow the output here → http://www.th3van.dk/benchmark/uplink/time-1tb.log (refresh manually please)

You can also follow the upload progress via MRTG graphs → ** Traffic Analysis for Th3Vans storj.dk Server030 **

Th3Van.dk

Stob · May 6, 2023, 9:38am

@Th3Van thanks for checking this. It’s interesting to see. Could you try uploading with the –inmemory-erasure-coding setting active on the new version?

Th3Van · May 6, 2023, 9:44am

Sure , I’ve restarted the 1 TB test with the --inmemory-erasure-coding parameter

Th3Van.dk

Alexey · May 6, 2023, 9:54am

I guess nothing changed on your side?

Th3Van · May 6, 2023, 9:57am

Not that i know of, and v1.77.2 are still doing a 1G file upload at ~5-6 seconds.

Th3Van.dk

Th3Van · May 6, 2023, 12:37pm

Uploading of 1TB file (using v1.78.2) are still ongoing…

(started @ 2023-05-06T12:47:52 CEST)

Th3Van.dk

Th3Van · May 6, 2023, 3:31pm

Looks like v1.78.2 ran out of memory on a system with 512GB of ram :

May 6 16:36:47 server030 systemd[1]: session-11243.scope: A process of this unit has been killed by the OOM killer.

1t-file1  327.69 GB / 1.00 TB [======================================================================================>----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------] 32.77% 15.91 MiB/s
./upload-test-1tb.sh: line 43: 3292748 Killed

I have not looked into the details yet, but I’ll give it another go tomorrow, since I’m out of the office for today.

Th3Van.dk

Stob · May 6, 2023, 4:09pm

Sorry, the --inmemory-erasure-coding parameter wouldn’t work for 1TB file if you only have 512GB of RAM. I was wondering if it makes a difference on the 1GB file.

Th3Van · May 7, 2023, 9:41am

Ran the 1 TB upload again.

v1.77.2 :

Starting upload of 1TB file using uplink v1.77.2 @ 2023-05-07T03:20:35 CEST

- Command : ./uplink-1772 cp --parallelism 10 1t-file1 sj://test/1t-file1-1683422435999343
- 2023-05-07T03:20:44 CEST - 2.01 GB / 1.00 TB
- 2023-05-07T03:54:38 CEST - 598.55 GB / 1.00 TB
- 2023-05-07T04:09:38 CEST - 859.51 GB / 1.00 TB

- Upload time (seconds) : 3422

Finished upload of 1TB file using uplink v1.77.2 @ 2023-05-07T04:17:38 CEST

Uplink v1.77.2 uploaded the 1TB file, in just under an hour.

v1.78.2 :

Starting upload of 1TB file using uplink v1.78.2 @ 2023-05-07T04:17:48 CEST

- Command : ./uplink-1782 cp --maximum-concurrent-pieces 2550 --long-tail-margin 125 1t-file1 sj://test/1t-file1-1683425868339832
- 2023-05-07T04:24:38 CEST - 9.72 GB / 1.00 TB
- 2023-05-07T04:39:39 CEST - 33.95 GB / 1.00 TB
- 2023-05-07T04:54:39 CEST - 58.53 GB / 1.00 TB
- 2023-05-07T05:09:39 CEST - 82.71 GB / 1.00 TB
- 2023-05-07T05:24:39 CEST - 107.85 GB / 1.00 TB
- 2023-05-07T05:39:39 CEST - 135.37 GB / 1.00 TB
- 2023-05-07T05:54:39 CEST - 163.83 GB / 1.00 TB
- 2023-05-07T06:09:39 CEST - 188.78 GB / 1.00 TB
- 2023-05-07T06:24:39 CEST - 203.20 GB / 1.00 TB
- 2023-05-07T06:39:39 CEST - 232.90 GB / 1.00 TB
- 2023-05-07T06:54:39 CEST - 263.00 GB / 1.00 TB
- 2023-05-07T07:09:39 CEST - 293.53 GB / 1.00 TB
- 2023-05-07T07:24:39 CEST - 320.89 GB / 1.00 TB -- Ran out of fuel - OOM !!!!!

- Upload time (seconds) : 13065 -- Upload failed !!!!!

Finished upload of 1TB file using uplink v1.78.2 @ 2023-05-07T07:57:02 CEST

v1.78.2 ran for more than 3.5 hours (uploading ~32% of the 1TB file) and hit OOM again, despite not using the --inmemory-erasure-coding parameter.

Th3Van.dk

Th3Van · May 10, 2023, 7:14am