The latest one should be stable. But we implement a new features every time and need to make sure, that it’s working as expected. If it’s not - we want to compare the behavior with a previous versions as well to fix the issue.
In general, the latest version should be the best.
this suggests that your network is overloaded (likely the router) and you need to decrease a parallelism with the option --maximum-concurrent-segments for example, it has a value of 10 segments by default).
The next would be --maximum-concurrent-pieces (it has 300 pieces as a default value).
Looks like you are right.
the uplink v1.86.1 also failed to perform the backup:
root-node-1 | 79.1GiB 0:22:35 [83.5MiB/s] [59.8MiB/s] [> ] 2% ETA 17:34:17
root-node-1 | 80.0GiB 0:22:40 [ 176MiB/s] [60.2MiB/s] [> ] 2% ETA 17:26:35
root-node-1 | failed to upload part 78: uplink: encryption: context canceled
root-node-1 | failed to upload part 77: context canceled
root-node-1 | error getting reader for part 80: context canceled
root-node-1 | failed to upload part 79: uplink: context canceled
root-node-1 | 80.0GiB 0:22:40 [60.2MiB/s] [60.2MiB/s] [> ] 2%
root-node-1 |
root-node-1 | 11:52:00: Uploading metadata
Interestingly that it was working without any issue previously.
The networking might be overloaded, this is something I’ll have to check.
Despite the networking being overloaded, would it be possible to add some re-try mechanism for uplink to make sure it simply restarts its threads to continue the upload, so it gets more reliable despite the networking blips/congestion? FWIW, each time I restart the backup runs at 60 MiB/s, I wouldn’t call it congested/overloaded too much…
Update
Have noticed there is a tx-drop counter has 379 value for the interface used by the proxmox host that runs this VM. I’ll keep an eye on it in case of further issues.
You may check with the same old version (it doesn’t have improvements to use all your bandwidth at max), I believe it will work smoothly.
Or you may tune the new version to be slow as the old one to do not overload your network.
I’m now running v1.90.2 with reduced --maximum-concurrent-segments=5 --maximum-concurrent-pieces=100 (defaults are 10 and 300 respectively). And the -p 4 -t 4 - have always been there.
uplink v1.90.2 failed after 8h26m of running with -p 4 -t 4 --maximum-concurrent-segments=5 --maximum-concurrent-pieces=100 --progress=false flags.
tx-drop counter (on the MikroTik interface for the host connected) haven’t increased. I assume MikroTik router can be ruled out. I could see some dropped packets on the proxmox host for the VM though. Might be something there.
I’ll give --maximum-concurrent-segments=4 --maximum-concurrent-pieces=4 --parallelism 4 a shot.
Logs:
root-node-1 | 1.90TiB 8:26:13 [ 113MiB/s] [65.5MiB/s] [==========> ] 51% ETA 7:57:29
root-node-1 | 1.90TiB 8:26:19 [67.4MiB/s] [65.5MiB/s] [==========> ] 51% ETA 7:57:22
root-node-1 | 1.90TiB 8:26:23 [ 173MiB/s] [65.5MiB/s] [==========> ] 51% ETA 7:57:06
root-node-1 | failed to upload part 1940: uplink: encryption: metaclient: manager closed: closed: read tcp 172.19.0.3:56284->34.150.199.48:7777: read: connection reset by peer
root-node-1 | failed to upload part 1941: context canceled
root-node-1 | error getting reader for part 1943: context canceled
root-node-1 | failed to upload part 1939: uplink: encryption: context canceled
root-node-1 | failed to upload part 1942: uplink: context canceled
I’ve cancelled it and re-run with uplink cp -p 4 -t 4 --progress=false parameters again (it always used to work between with uplinkv1.86.1 between Aug/18-Nov/12 with these params)
Update 2
After 6h8m
No luck with the good old settings, and the uplink v1.90.2.
This time no RST packets, but encryption: metaclient: internal error instead:
root-node-1 | 1.61TiB 6:08:43 [98.4MiB/s] [76.1MiB/s] [========> ] 43% ETA 7:57:43
root-node-1 | 1.61TiB 6:08:54 [45.4MiB/s] [76.1MiB/s] [========> ] 43% ETA 7:57:43
root-node-1 | 1.61TiB 6:08:54 [45.4MiB/s] [76.1MiB/s] [========> ] 43% ETA 7:57:43
root-node-1 | 1.61TiB 6:08:59 [ 113MiB/s] [76.1MiB/s] [========> ] 43% ETA 7:57:33
root-node-1 | failed to upload part 1643: uplink: encryption: metaclient: internal error
root-node-1 | error getting reader for part 1646: context canceled
root-node-1 | failed to upload part 1642: uplink: failed to upload enough pieces (needed at least 80 but got 72)
root-node-1 | failed to upload part 1644: uplink: encryption: context canceled
root-node-1 | failed to upload part 1645: uplink: context canceled
This indicated a dropped connections during upload. The resulting exception may be thrown from any place, where the cancellation is happened.
So still not ideal options…
Seems need to reduce either -p, or -t, maybe both.
Could you please try the default? I.e. no -p and -t at all?
Sounds like a good idea.
Running now uplink v1.90.2 without the -p / -t flags, i.e. uplink cp --progress=false - sj://...
FWIW, about an hour ago v1.90.2 with my usual flags uplink cp -p 4 -t 4 --progress=false - sj:// failed with failed to upload part 312: uplink: metaclient: write tcp 172.19.0.3:58858->34.172.100.72:7777: write: connection reset by peer; write tcp 172.19.0.3:58858->34.172.100.72:7777: write: connection reset by peer. (Am still having some hope it will work since it’s been always (Aug/18 -Nov/12, v1.86.1) working)
I uploaded a 1GB file with uplink v1.76.2 using the default settings and observed no more than 210 connections at any given time. I used this command to watch the active connections:
If you use --maximum-concurrent-segments 1 --maximum-concurrent-pieces 100 with the latest uplink, you should see no more than 220 active connections at any given time. You may try bumping up the maximum concurrent pieces for better performance while watching if you router can still handle the number of parallel connections.
Sounds like a good idea.
Running now uplink v1.90.2 without the -p / -t flags, i.e. uplink cp --progress=false - sj://...
[/quote]
The backup successfully completed with v1.90.2 when I ran it without -p 4 -t 4 !
It took two times longer though.
I’ll try only with -p 4 now.
Though, since we are using the uplink with pipe, i.e. uploading the stdin (<gzip-on-the-fly> | uplink cp - sj://) which forces the data go sequentially into the uplink - would -p set to anything higher than 1 actually make sense?
-p, --parallelism int Controls how many parallel parts to upload/download from a file (default 1)
I guess the -t might be more meaningful in case of sequential upload (via stdin (<gzip-on-the-fly> | uplink cp - sj://)):
-t, --transfers int Controls how many uploads/downloads to perform in parallel (default 1)
Update: uplink v1.90.2 with -p 4 only
root-node-1 | 2.46TiB 9:32:01 [71.4MiB/s] [75.1MiB/s] [=============> ] 66% ETA 4:49:49
root-node-1 | 2.46TiB 9:32:06 [ 144MiB/s] [75.1MiB/s] [=============> ] 66% ETA 4:49:37
root-node-1 | failed to upload part 2511: uplink: encryption: metaclient: write tcp 172.19.0.3:58064->34.172.100.72:7777: write: connection reset by peer; write tcp 172.19.0.3:58064->34.172.100.72:7777: write: connection reset by peer
root-node-1 | error getting reader for part 2518: context canceled
root-node-1 | failed to upload part 2516: context canceled
root-node-1 | failed to upload part 2515: uplink: encryption: context canceled
root-node-1 | failed to upload part 2517: uplink: context canceled
So far it worked without-p 4 -t 4, i.e. with the default --maximum-concurrent-pieces (default 300) --maximum-concurrent-segments (default 10); but failed with -p 4 alone:
1933440 is a pid of uplink process (is namespaced, hence used nsenter)
The -t option is for parallel transfers of multiple files, in case of pipe there is only one transfer, so this parameter wouldn’t do something useful.
Please try to use -p 2 instead, if it’s succeed, you may try to use -p 3
By the way v1.91.2 should upload and download faster, however we likely need to find an ideal options for your case anyway, to be still fast but not overload your network stack.
Running uplink v1.90.2 with -t 4 only - completed successfully.
Will try these options as well. Thanks.
I am still not sure the issue is on my end.
Are there any tools that can test this? I.e. if I could run something with the large amount of connections to see there were no issues.
I am usually using the iperf3 tool between two servers physically connected to the single MikroTik switch/router we have, the results were (see image below).
There were some retransmissions (Retr) which could happen due to minor packet loss which is not uncommon in TCP/IP networks. And the number is relatively small given the high volume of data transferred.
I’d expect the uplink to attempt retransmit the data instead of just failing. If that’s this, is it possible to implement?
I think it is worth adding some hints to the uplink CLI’s --help section, to highlight which arguments do not apply in case of stdin upload/download, e.g.:
scroll to the most right
$ uplink --interactive=false cp --help --advanced
...
-t, --transfers int Controls how many uploads/downloads to perform in parallel (default 1) (defaults to 1 when piped)
...
uplink doesn’t retransmit failed pieces, it requests more nodes than needed (130) for the each segment (64MiB or less) to start an upload and cancel all remained when the first 80 are finished. This is of course produces a lot of connections, when you transfer 10 segments in parallel.
If you want a retry feature, then you can use rclone instead, it has this feature already. For pipe you would use
The last time I’ve tried -p 4 -t 4 --maximum-concurrent-segments=5 --maximum-concurrent-pieces=100 --progress=false => it failed after 8h26m.
The backup successfully completes with -p 1, -p 2 alone.
I am currently trying the -p 3 and then -p 4 alone.
If either fails, I’ll lower the --maximum-concurrent-segments 1 --maximum-concurrent-pieces 100 as suggested while keeping either -p 3 or -p 4 depending on which one fails (and if it fails at all ). (( Might keep increasing it to -p 5, -p 6, and so forth until I break it, so to give the ``--maximum-concurrent-... flags suggested a shot ))