We have a new uplink binary and a new gateway-mt. The main difference is a faster upload, especially for bigger files. The old way to upload a file was based on segments. It had 2 big disadvantages. If too many piece transfers fail the entire segment will fail. Retry uploading was only possible for the entire segment. The second problem was performance. It was uploading one segment at a time and waiting for the segment to finish before starting the next segment. It was possible to upload multiple segments at the same time but that was also increasing the number of connections and might cause other problems.
The new code has a retry on the piece level. If too many piece transfers fail the new code will just retry uploading these pieces and still finish the transfer. Instead of concurrent segment transfers, we now have concurrent piece transfers. This also means it can be scaled down to the equivalent of 0.5 concurrent segments or lower. Even with such a low concurrency, there will be no pause between 2 segments. So overall a much more stable and faster upload.
This is great news and will definitely help performance for bigger file transfers. I’m wondering, could multiple files be treated the same way in native connections? So, starting uploads for the next file when the first piece transfer of the previous file is finished. I have always been a little confused about why concurrency and parallelism are two separate options. Especially with this new way of doing things, having concurrency work across files would be much more efficient
root@server030:/disk103/tmp/uplink-test# time ./uplink-1782 cp 1g-file1 sj://test
upload 1g-file1 to sj://test/1g-file1
1g-file1 1.00 GB / 1.00 GB [=======================================] 100.00% 39.63 MiB/s
Also, it looks like the --parallelism option has been removed from uplink now.
root@server030:/disk103/tmp/uplink-test# ./uplink-1782 cp -h --advanced
uplink cp [--access string] [--recursive] [--transfers int] [--dry-run] [--progress] [--range string] [--maximum-concurrent-pieces int] [--long-tail-margin int] [--inmemory-erasure-coding] [--expires relative_date] [--metadata string] [locations ...]
Copies files or objects into or out of storj
locations Locations to copy (at least one source and one destination). Use - for standard input/output
--access string Access name or value to use
-r, --recursive Peform a recursive copy
-t, --transfers int Controls how many uploads/downloads to perform in parallel (default 1)
--dry-run Print what operations would happen but don't execute them
--progress Show a progress bar when possible (default true)
--range string Downloads the specified range bytes of an object. For more information about the HTTP Range header, see https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35
--maximum-concurrent-pieces int Maximum concurrent pieces to upload at once per transfer (default 300)
--long-tail-margin int How many extra pieces to upload and cancel per segment (default 50)
--inmemory-erasure-coding Keep erasure-coded pieces in-memory instead of writing them on the disk during upload
--expires relative_date Schedule removal after this time (e.g. '+2h', 'now', '2020-01-02T15:04:05Z0700')
--metadata string optional metadata for the object. Please use a single level JSON object of string to string only
Parallelism 10 would be 10 * 110 piece transfers. The default for the new uplink is just 300 piece transfers. Thats an unfair comparison. I would expect the new uplink with --maximum-concurrent-pieces 1100 to be at least the same speed.
There also seems to be --long-tail-margin that might have an impact. For my limited internet connection I would go with a lower value to reduce the overhead. I haven’t tested that yet.
I’ve put together a little script to test upload using a MCP (maximum-concurrent-pieces) range from 100 → 5000, and combining it with a LTM (long-tail-margin) range of 25 → 300 to see how much upload time differ on uploading a 1G file.