We have a new uplink binary and a new gateway-mt. The main difference is a faster upload, especially for bigger files. The old way to upload a file was based on segments. It had 2 big disadvantages. If too many piece transfers fail the entire segment will fail. Retry uploading was only possible for the entire segment. The second problem was performance. It was uploading one segment at a time and waiting for the segment to finish before starting the next segment. It was possible to upload multiple segments at the same time but that was also increasing the number of connections and might cause other problems.
The new code has a retry on the piece level. If too many piece transfers fail the new code will just retry uploading these pieces and still finish the transfer. Instead of concurrent segment transfers, we now have concurrent piece transfers. This also means it can be scaled down to the equivalent of 0.5 concurrent segments or lower. Even with such a low concurrency, there will be no pause between 2 segments. So overall a much more stable and faster upload.
Today we are also enabling the new upload code in gateway-mt. Note: For better performance, a multipart size of more than 64MB would be best like for example 1GB.
Up next:
Libuplink will be switched over to the new code. That will enable tools like rclone to benefit from it.
Also on the roadmap is a similar refactoring for downloads.
This is great news and will definitely help performance for bigger file transfers. I’m wondering, could multiple files be treated the same way in native connections? So, starting uploads for the next file when the first piece transfer of the previous file is finished. I have always been a little confused about why concurrency and parallelism are two separate options. Especially with this new way of doing things, having concurrency work across files would be much more efficient
root@server030:/disk103/tmp/uplink-test# time ./uplink-1782 cp 1g-file1 sj://test
upload 1g-file1 to sj://test/1g-file1
1g-file1 1.00 GB / 1.00 GB [=======================================] 100.00% 39.63 MiB/s
real 2m4,481s
user 2m58,373s
sys 2m26,370s
Also, it looks like the --parallelism option has been removed from uplink now.
root@server030:/disk103/tmp/uplink-test# ./uplink-1782 cp -h --advanced
Usage:
uplink cp [--access string] [--recursive] [--transfers int] [--dry-run] [--progress] [--range string] [--maximum-concurrent-pieces int] [--long-tail-margin int] [--inmemory-erasure-coding] [--expires relative_date] [--metadata string] [locations ...]
Copies files or objects into or out of storj
Arguments:
locations Locations to copy (at least one source and one destination). Use - for standard input/output
Flags:
--access string Access name or value to use
-r, --recursive Peform a recursive copy
-t, --transfers int Controls how many uploads/downloads to perform in parallel (default 1)
--dry-run Print what operations would happen but don't execute them
--progress Show a progress bar when possible (default true)
--range string Downloads the specified range bytes of an object. For more information about the HTTP Range header, see https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35
--maximum-concurrent-pieces int Maximum concurrent pieces to upload at once per transfer (default 300)
--long-tail-margin int How many extra pieces to upload and cancel per segment (default 50)
--inmemory-erasure-coding Keep erasure-coded pieces in-memory instead of writing them on the disk during upload
--expires relative_date Schedule removal after this time (e.g. '+2h', 'now', '2020-01-02T15:04:05Z0700')
--metadata string optional metadata for the object. Please use a single level JSON object of string to string only
Parallelism 10 would be 10 * 110 piece transfers. The default for the new uplink is just 300 piece transfers. Thats an unfair comparison. I would expect the new uplink with --maximum-concurrent-pieces 1100 to be at least the same speed.
There also seems to be --long-tail-margin that might have an impact. For my limited internet connection I would go with a lower value to reduce the overhead. I haven’t tested that yet.
I’ve put together a little script to test upload using a MCP (maximum-concurrent-pieces) range from 100 → 5000, and combining it with a LTM (long-tail-margin) range of 25 → 300 to see how much upload time differ on uploading a 1G file.
Testing with the same 1G file and using all “21 seconds” parameters (MCP/LTM), the result is now higer upload times than before :
Starting upload of 1GB file using uplink v1.77.2 @ 2023-05-06T10:55:51 CEST
- Command : ./uplink-1772 cp -p 10 1g-file1 sj://test/1g-file1-1683363351939651
- Upload time (seconds) : 5
Finished upload of 1GB file using uplink v1.77.2 @ 2023-05-06T10:55:57 CEST
------
Starting upload of 1GB files using uplink v1.78.2 @ 2023-05-06T10:55:57 CEST
- Command : ./uplink-1782 cp --maximum-concurrent-pieces 350 --long-tail-margin 150 1g-file1 sj://test/1g-file1-1683363357850288
- Upload time (seconds) : 94
- Command : ./uplink-1782 cp --maximum-concurrent-pieces 450 --long-tail-margin 250 1g-file1 sj://test/1g-file1-1683363452665593
- Upload time (seconds) : 80
- Command : ./uplink-1782 cp --maximum-concurrent-pieces 2550 --long-tail-margin 125 1g-file1 sj://test/1g-file1-1683363533339900
- Upload time (seconds) : 68
- Command : ./uplink-1782 cp --maximum-concurrent-pieces 4000 --long-tail-margin 200 1g-file1 sj://test/1g-file1-1683363601588088
- Upload time (seconds) : 75
- Command : ./uplink-1782 cp --maximum-concurrent-pieces 4250 --long-tail-margin 150 1g-file1 sj://test/1g-file1-1683363677493691
- Upload time (seconds) : 62
- Command : ./uplink-1782 cp --maximum-concurrent-pieces 4300 --long-tail-margin 50 1g-file1 sj://test/1g-file1-1683363739572397
- Upload time (seconds) : 72
Finished upload of 1GB files using uplink v1.78.2 @ 2023-05-06T11:03:31 CEST
And just to be sure, I’m currently uploading a 1TB file using uplink v1.77.2 and later v1.78.2 using the --maximum-concurrent-pieces 2550 and --long-tail-margin 125 parameters.
@Th3Van thanks for checking this. It’s interesting to see. Could you try uploading with the –inmemory-erasure-coding setting active on the new version?
Sorry, the --inmemory-erasure-coding parameter wouldn’t work for 1TB file if you only have 512GB of RAM. I was wondering if it makes a difference on the 1GB file.