Uplink: failed to upload enough pieces (needed at least 80 but got 78)

andy01 · November 19, 2023, 11:55am

How can one make sure which uplink version is stable & working?

For us - v1.86.1 has been working flawlessly since August 18th (making ~2-3 TiB of backups weekly, on Sundays)

However newer versions are always breaking:

uplink `v1.89.5`

root-node-1  |  929GiB 3:53:24 [90.4MiB/s] [68.0MiB/s] [====>                ] 24% ETA 11:54:21
root-node-1  | failed to upload part 928: uplink: encryption: metaclient: manager closed: closed: read tcp 172.19.0.3:39632->34.172.100.72:7777: read: connection reset by peer
root-node-1  | error getting reader for part 930: context canceled
root-node-1  | failed to upload part 927: uplink: failed to upload enough pieces (needed at least 80 but got 78)
root-node-1  | failed to upload part 929: uplink: context canceled
root-node-1  |  930GiB 3:53:27 [68.0MiB/s] [68.0MiB/s] [====>                ] 24%

uplink `v1.90.2`

root-node-1  |  526GiB 1:58:55 [67.6MiB/s] [75.6MiB/s] [=>                   ] 13% ETA 12:13:09
root-node-1  |  526GiB 1:59:00 [66.0MiB/s] [75.5MiB/s] [=>                   ] 13% ETA 12:13:08
root-node-1  | failed to upload part 514: uplink: encryption: metaclient: manager closed: closed: read tcp 172.19.0.3:51004->34.150.199.48:7777: read: connection reset by peer
root-node-1  | failed to upload part 524: uplink: failed to upload enough pieces (needed at least 80 but got 79)
root-node-1  | failed to upload part 526: uplink: encryption: context canceled
root-node-1  | failed to upload part 525: uplink: encryption: context canceled
root-node-1  |  527GiB 1:59:06 [35.7MiB/s] [75.5MiB/s] [=>                   ] 13% ETA 12:13:25
root-node-1  |  527GiB 1:59:06 [75.5MiB/s] [75.5MiB/s] [=>                   ] 13%

Alexey · November 19, 2023, 12:04pm

The latest one should be stable. But we implement a new features every time and need to make sure, that it’s working as expected. If it’s not - we want to compare the behavior with a previous versions as well to fix the issue.
In general, the latest version should be the best.

this suggests that your network is overloaded (likely the router) and you need to decrease a parallelism with the option --maximum-concurrent-segments for example, it has a value of 10 segments by default).
The next would be --maximum-concurrent-pieces (it has 300 pieces as a default value).

andy01 · November 19, 2023, 12:20pm

Looks like you are right.
the uplink v1.86.1 also failed to perform the backup:

root-node-1  | 79.1GiB 0:22:35 [83.5MiB/s] [59.8MiB/s] [>                    ]  2% ETA 17:34:17
root-node-1  | 80.0GiB 0:22:40 [ 176MiB/s] [60.2MiB/s] [>                    ]  2% ETA 17:26:35
root-node-1  | failed to upload part 78: uplink: encryption: context canceled
root-node-1  | failed to upload part 77: context canceled
root-node-1  | error getting reader for part 80: context canceled
root-node-1  | failed to upload part 79: uplink: context canceled
root-node-1  | 80.0GiB 0:22:40 [60.2MiB/s] [60.2MiB/s] [>                    ]  2%             
root-node-1  | 
root-node-1  | 11:52:00: Uploading metadata

Interestingly that it was working without any issue previously.

The networking might be overloaded, this is something I’ll have to check.

Despite the networking being overloaded, would it be possible to add some re-try mechanism for uplink to make sure it simply restarts its threads to continue the upload, so it gets more reliable despite the networking blips/congestion? FWIW, each time I restart the backup runs at 60 MiB/s, I wouldn’t call it congested/overloaded too much…

Update

Have noticed there is a tx-drop counter has 379 value for the interface used by the proxmox host that runs this VM. I’ll keep an eye on it in case of further issues.

Alexey · November 19, 2023, 12:24pm

You may check with the same old version (it doesn’t have improvements to use all your bandwidth at max), I believe it will work smoothly.
Or you may tune the new version to be slow as the old one to do not overload your network.

andy01 · November 19, 2023, 12:37pm

Is it possible to set the config in the newer version so it will work as close to the old version (say v1.76.2 previously suggested) as possible?

Maybe with?

--maximum-concurrent-segments=1 --maximum-concurrent-pieces=1

I’m now running v1.90.2 with reduced --maximum-concurrent-segments=5 --maximum-concurrent-pieces=100 (defaults are 10 and 300 respectively). And the -p 4 -t 4 - have always been there.

uplink cp -p 4 -t 4 --maximum-concurrent-segments=5 --maximum-concurrent-pieces=100 --progress=false

So far it is running, let’s see if it’ll be better:

root-node-1  | 14.8GiB 0:04:25 [67.3MiB/s] [57.3MiB/s] [>                    ]  0% ETA 18:38:18
root-node-1  | 15.2GiB 0:04:30 [71.9MiB/s] [57.6MiB/s] [>                    ]  0% ETA 18:32:56
root-node-1  | 15.5GiB 0:04:35 [64.4MiB/s] [57.7MiB/s] [>                    ]  0% ETA 18:30:27
root-node-1  | 15.8GiB 0:04:40 [59.4MiB/s] [57.7MiB/s] [>                    ]  0% ETA 18:29:51
root-node-1  | 16.0GiB 0:04:58 [11.4MiB/s] [54.9MiB/s] [>                    ]  0% ETA 19:26:13
root-node-1  | 16.0GiB 0:04:58 [11.4MiB/s] [54.9MiB/s] [>                    ]  0% ETA 19:26:12
root-node-1  | 16.3GiB 0:05:03 [69.2MiB/s] [55.1MiB/s] [>                    ]  0% ETA 19:21:03
root-node-1  | 16.7GiB 0:05:08 [65.6MiB/s] [55.3MiB/s] [>                    ]  0% ETA 19:17:30

The MikroTik router through which the traffic is going from the qemu/kvm VM => Proxmox => MikroTik => WAN is barely loaded as you can see

Alexey · November 19, 2023, 12:38pm

The closest would be (I think):

--maximum-concurrent-segments=4 --maximum-concurrent-pieces=4 --parallelism 4

but if it works smoothly with

then you may use it.

andy01 · November 19, 2023, 9:27pm

Status update

uplink v1.90.2 failed after 8h26m of running with -p 4 -t 4 --maximum-concurrent-segments=5 --maximum-concurrent-pieces=100 --progress=false flags.
tx-drop counter (on the MikroTik interface for the host connected) haven’t increased. I assume MikroTik router can be ruled out. I could see some dropped packets on the proxmox host for the VM though. Might be something there.
I’ll give --maximum-concurrent-segments=4 --maximum-concurrent-pieces=4 --parallelism 4 a shot.

Logs:

root-node-1  | 1.90TiB 8:26:13 [ 113MiB/s] [65.5MiB/s] [==========>           ] 51% ETA 7:57:29
root-node-1  | 1.90TiB 8:26:19 [67.4MiB/s] [65.5MiB/s] [==========>           ] 51% ETA 7:57:22
root-node-1  | 1.90TiB 8:26:23 [ 173MiB/s] [65.5MiB/s] [==========>           ] 51% ETA 7:57:06
root-node-1  | failed to upload part 1940: uplink: encryption: metaclient: manager closed: closed: read tcp 172.19.0.3:56284->34.150.199.48:7777: read: connection reset by peer
root-node-1  | failed to upload part 1941: context canceled
root-node-1  | error getting reader for part 1943: context canceled
root-node-1  | failed to upload part 1939: uplink: encryption: context canceled
root-node-1  | failed to upload part 1942: uplink: context canceled

andy01 · November 19, 2023, 9:46pm

Now uplink v1.90.2 just stuck with these settings

strace’d it when noticed it got stuck:

ran strace for ~10 seconds

# strace -c -f -p 1650635
strace: Process 1650635 attached with 8 threads
^Cstrace: Process 1650635 detached
strace: Process 1650636 detached
strace: Process 1650637 detached
strace: Process 1650638 detached
strace: Process 1650639 detached
strace: Process 1650640 detached
strace: Process 1650642 detached
strace: Process 1650648 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 54.46    0.444440          66      6706      1425 futex
 21.77    0.177636          35      5056           nanosleep
 19.92    0.162544         307       529           epoll_pwait
  2.57    0.020973          19      1060       115 write
  0.42    0.003434          14       233        81 read
  0.25    0.002000          60        33           close
  0.18    0.001481          23        64           epoll_ctl
  0.15    0.001205          38        31        25 connect
  0.10    0.000836           4       179           setsockopt
  0.08    0.000681          21        31           socket
  0.04    0.000303           7        43           getrandom
  0.02    0.000194           6        32         2 getpeername
  0.02    0.000187           5        32           getsockname
  0.01    0.000112           4        26           getsockopt
  0.01    0.000046           3        13           sched_yield
  0.00    0.000036          36         1           restart_syscall
  0.00    0.000000           0         1           mmap
  0.00    0.000000           0         3           newfstatat
------ ----------- ----------- --------- --------- ----------------
100.00    0.816108          57     14073      1648 total

root-node-1  |  459MiB 0:00:05 [92.0MiB/s] [92.0MiB/s] [>                    ]  0% ETA 11:40:16
root-node-1  |  927MiB 0:00:10 [93.5MiB/s] [92.8MiB/s] [>                    ]  0% ETA 11:34:20
root-node-1  | 1.38GiB 0:00:15 [97.0MiB/s] [94.2MiB/s] [>                    ]  0% ETA 11:23:34
root-node-1  | 1.85GiB 0:00:20 [96.8MiB/s] [94.8MiB/s] [>                    ]  0% ETA 11:18:38
root-node-1  | 2.33GiB 0:00:25 [98.6MiB/s] [95.6MiB/s] [>                    ]  0% ETA 11:13:32
root-node-1  | 2.80GiB 0:00:30 [96.5MiB/s] [95.7MiB/s] [>                    ]  0% ETA 11:12:29
root-node-1  | 3.29GiB 0:00:35 [99.1MiB/s] [96.2MiB/s] [>                    ]  0% ETA 11:09:01
root-node-1  | 3.77GiB 0:00:40 [98.0MiB/s] [96.4MiB/s] [>                    ]  0% ETA 11:07:25
<No output>

Established connections:

1650635 is Pid of the uplink running in Docker

# nsenter -n -t 1650635 ss -tulpan |grep ESTAB
tcp   ESTAB      0      0         172.19.0.3:48820   185.157.79.55:28967 users:(("uplink",pid=1650635,fd=38))
tcp   ESTAB      0      538384    172.19.0.3:56008 192.121.102.220:28967 users:(("uplink",pid=1650635,fd=11))
tcp   ESTAB      0      0         172.19.0.3:50142 188.218.253.217:28968 users:(("uplink",pid=1650635,fd=39))
tcp   ESTAB      0      0         172.19.0.3:39830  45.143.136.165:29273 users:(("uplink",pid=1650635,fd=3)) 
tcp   ESTAB      0      0         172.19.0.3:33062   62.141.38.218:28967 users:(("uplink",pid=1650635,fd=13))
tcp   ESTAB      0      22203     172.19.0.3:57620  129.152.28.137:29025 users:(("uplink",pid=1650635,fd=19))
tcp   ESTAB      0      133452    172.19.0.3:48026     45.74.1.224:28967 users:(("uplink",pid=1650635,fd=20))
tcp   ESTAB      0      0         172.19.0.3:55424 109.195.113.236:28969 users:(("uplink",pid=1650635,fd=21))
tcp   ESTAB      0      0         172.19.0.3:42100  155.248.213.59:28968 users:(("uplink",pid=1650635,fd=16))
tcp   ESTAB      0      535760    172.19.0.3:38440 116.203.245.234:28967 users:(("uplink",pid=1650635,fd=34))
tcp   ESTAB      0      22203     172.19.0.3:57628  129.152.28.137:29025 users:(("uplink",pid=1650635,fd=28))
tcp   ESTAB      0      0         172.19.0.3:55838    89.223.70.72:29112 users:(("uplink",pid=1650635,fd=12))
tcp   ESTAB      0      238560    172.19.0.3:55758  98.127.203.133:28967 users:(("uplink",pid=1650635,fd=24))
tcp   ESTAB      0      0         172.19.0.3:54270   109.86.250.30:28971 users:(("uplink",pid=1650635,fd=45))
tcp   ESTAB      0      0         172.19.0.3:59394 129.151.144.254:29124 users:(("uplink",pid=1650635,fd=18))
tcp   ESTAB      0      0         172.19.0.3:36594   82.165.97.251:28967 users:(("uplink",pid=1650635,fd=36))
tcp   ESTAB      0      342000    172.19.0.3:40064  185.195.25.156:28968 users:(("uplink",pid=1650635,fd=44))
tcp   ESTAB      0      0         172.19.0.3:50432   81.137.181.81:28967 users:(("uplink",pid=1650635,fd=8)) 
tcp   ESTAB      0      0         172.19.0.3:57388    77.68.22.251:28967 users:(("uplink",pid=1650635,fd=29))
tcp   ESTAB      0      175208    172.19.0.3:33400     86.1.198.67:28967 users:(("uplink",pid=1650635,fd=40))
tcp   ESTAB      0      562100    172.19.0.3:32992 195.182.134.234:58061 users:(("uplink",pid=1650635,fd=14))
tcp   ESTAB      0      464826    172.19.0.3:59622 185.195.237.203:36867 users:(("uplink",pid=1650635,fd=33))
tcp   ESTAB      0      0         172.19.0.3:51248    78.47.33.150:28967 users:(("uplink",pid=1650635,fd=31))
tcp   ESTAB      0      0         172.19.0.3:35476   37.35.100.142:30007 users:(("uplink",pid=1650635,fd=41))
tcp   ESTAB      0      0         172.19.0.3:37886   46.105.45.171:29405 users:(("uplink",pid=1650635,fd=15))
tcp   ESTAB      0      0         172.19.0.3:38296  107.189.10.111:28967 users:(("uplink",pid=1650635,fd=42))
tcp   ESTAB      0      0         172.19.0.3:55946  182.55.140.236:28967 users:(("uplink",pid=1650635,fd=7)) 
tcp   ESTAB      0      0         172.19.0.3:51596   83.174.220.77:28995 users:(("uplink",pid=1650635,fd=9)) 
tcp   ESTAB      0      0         172.19.0.3:55844    89.223.70.72:29112 users:(("uplink",pid=1650635,fd=43))
tcp   ESTAB      0      0         172.19.0.3:42418  89.151.132.222:28967 users:(("uplink",pid=1650635,fd=46))
tcp   ESTAB      0      0         172.19.0.3:35128    81.19.42.106:28967 users:(("uplink",pid=1650635,fd=23))
tcp   ESTAB      0      777600    172.19.0.3:46582     5.88.81.145:28967 users:(("uplink",pid=1650635,fd=48))
tcp   ESTAB      0      0         172.19.0.3:52578   141.95.13.227:28908 users:(("uplink",pid=1650635,fd=25))
tcp   ESTAB      0      0         172.19.0.3:34132  46.232.250.101:28976 users:(("uplink",pid=1650635,fd=32))
tcp   ESTAB      0      0         172.19.0.3:57252 104.166.124.190:28291 users:(("uplink",pid=1650635,fd=17))
tcp   ESTAB      0      0         172.19.0.3:45668    88.99.254.46:28967 users:(("uplink",pid=1650635,fd=10))
tcp   ESTAB      0      97016     172.19.0.3:59412    178.19.24.92:28967 users:(("uplink",pid=1650635,fd=47))
tcp   ESTAB      0      0         172.19.0.3:42410  89.151.132.222:28967 users:(("uplink",pid=1650635,fd=30))

andy01 · November 20, 2023, 10:28am

Update

After 36 minutes - 12 hours

Eventually, the backup process didn’t stuck nor die, but - it is taking unacceptably long time because of unacceptably low speed (~1-2MiB/s):

root-node-1  | 4.00GiB 0:36:22 [ 113KiB/s] [1.88MiB/s] [>                 ]  0% ETA 23:19:26:15
root-node-1  | 4.00GiB 0:36:22 [ 113KiB/s] [1.88MiB/s] [>                 ]  0% ETA 23:19:25:43
root-node-1  | 4.49GiB 0:36:27 [ 100MiB/s] [2.10MiB/s] [>                 ]  0% ETA 21:05:56:35
root-node-1  | 4.98GiB 0:36:32 [99.1MiB/s] [2.32MiB/s] [>                 ]  0% ETA 19:05:21:39
root-node-1  | 5.00GiB 0:37:40 [ 373KiB/s] [2.27MiB/s] [>                 ]  0% ETA 19:17:22:08
root-node-1  | 5.00GiB 0:37:40 [ 373KiB/s] [2.27MiB/s] [>                 ]  0% ETA 19:17:21:59
...
...
...
root-node-1  | 51.0GiB 12:24:32 [22.8KiB/s] [1.17MiB/s] [>                ]  1% ETA 37:18:09:36
root-node-1  | 51.0GiB 12:24:32 [22.8KiB/s] [1.17MiB/s] [>                ]  1% ETA 37:18:09:32
root-node-1  | 51.5GiB 12:24:37 [98.3MiB/s] [1.18MiB/s] [>                ]  1% ETA 37:09:41:33

I’ve cancelled it and re-run with uplink cp -p 4 -t 4 --progress=false parameters again (it always used to work between with uplink v1.86.1 between Aug/18-Nov/12 with these params)

Update 2

After 6h8m

No luck with the good old settings, and the uplink v1.90.2.

This time no RST packets, but encryption: metaclient: internal error instead:

root-node-1  | 1.61TiB 6:08:43 [98.4MiB/s] [76.1MiB/s] [========>             ] 43% ETA 7:57:43
root-node-1  | 1.61TiB 6:08:54 [45.4MiB/s] [76.1MiB/s] [========>             ] 43% ETA 7:57:43
root-node-1  | 1.61TiB 6:08:54 [45.4MiB/s] [76.1MiB/s] [========>             ] 43% ETA 7:57:43
root-node-1  | 1.61TiB 6:08:59 [ 113MiB/s] [76.1MiB/s] [========>             ] 43% ETA 7:57:33
root-node-1  | failed to upload part 1643: uplink: encryption: metaclient: internal error
root-node-1  | error getting reader for part 1646: context canceled
root-node-1  | failed to upload part 1642: uplink: failed to upload enough pieces (needed at least 80 but got 72)
root-node-1  | failed to upload part 1644: uplink: encryption: context canceled
root-node-1  | failed to upload part 1645: uplink: context canceled

Alexey · November 21, 2023, 4:22am

This indicated a dropped connections during upload. The resulting exception may be thrown from any place, where the cancellation is happened.
So still not ideal options…
Seems need to reduce either -p, or -t, maybe both.
Could you please try the default? I.e. no -p and -t at all?

andy01 · November 21, 2023, 12:10pm

Thanks for splitting this into a separate issue!

Sounds like a good idea.
Running now uplink v1.90.2 without the -p / -t flags, i.e. uplink cp --progress=false - sj://...

FWIW, about an hour ago v1.90.2 with my usual flags uplink cp -p 4 -t 4 --progress=false - sj:// failed with failed to upload part 312: uplink: metaclient: write tcp 172.19.0.3:58858->34.172.100.72:7777: write: connection reset by peer; write tcp 172.19.0.3:58858->34.172.100.72:7777: write: connection reset by peer. (Am still having some hope it will work since it’s been always (Aug/18 -Nov/12, v1.86.1) working)

kaloyan · November 21, 2023, 1:49pm

I uploaded a 1GB file with uplink v1.76.2 using the default settings and observed no more than 210 connections at any given time. I used this command to watch the active connections:

netstat -natp | grep ESTABLISHED | grep uplink | wc -l

If you use --maximum-concurrent-segments 1 --maximum-concurrent-pieces 100 with the latest uplink, you should see no more than 220 active connections at any given time. You may try bumping up the maximum concurrent pieces for better performance while watching if you router can still handle the number of parallel connections.

andy01 · November 22, 2023, 11:09am

quote=“andy01, post:11, topic:24417”]

Sounds like a good idea.
Running now uplink v1.90.2 without the -p / -t flags, i.e. uplink cp --progress=false - sj://...
[/quote]

The backup successfully completed with v1.90.2 when I ran it without -p 4 -t 4 !
It took two times longer though.

I’ll try only with -p 4 now.

Though, since we are using the uplink with pipe, i.e. uploading the stdin (<gzip-on-the-fly> | uplink cp - sj://) which forces the data go sequentially into the uplink - would -p set to anything higher than 1 actually make sense?

    -p, --parallelism int                    Controls how many parallel parts to upload/download from a file (default 1)

I guess the -t might be more meaningful in case of sequential upload (via stdin (<gzip-on-the-fly> | uplink cp - sj://)):

    -t, --transfers int                      Controls how many uploads/downloads to perform in parallel (default 1)

Update: uplink `v1.90.2` with `-p 4` only

root-node-1  | 2.46TiB 9:32:01 [71.4MiB/s] [75.1MiB/s] [=============>        ] 66% ETA 4:49:49
root-node-1  | 2.46TiB 9:32:06 [ 144MiB/s] [75.1MiB/s] [=============>        ] 66% ETA 4:49:37
root-node-1  | failed to upload part 2511: uplink: encryption: metaclient: write tcp 172.19.0.3:58064->34.172.100.72:7777: write: connection reset by peer; write tcp 172.19.0.3:58064->34.172.100.72:7777: write: connection reset by peer
root-node-1  | error getting reader for part 2518: context canceled
root-node-1  | failed to upload part 2516: context canceled
root-node-1  | failed to upload part 2515: uplink: encryption: context canceled
root-node-1  | failed to upload part 2517: uplink: context canceled

So far it worked without -p 4 -t 4, i.e. with the default --maximum-concurrent-pieces (default 300) --maximum-concurrent-segments (default 10); but failed with -p 4 alone:

1933440 is a pid of uplink process (is namespaced, hence used nsenter)

# nsenter -n -t 1933440 ss -tulpan |grep ESTAB |wc -l
1807

WIP: uplink `v1.90.2` with `-t 4` only WIP

now am running uplink v1.90.2 with -t 4 only; will post the update as soon as I have it (in ~10 hours)

Alexey · November 23, 2023, 2:30am

The -t option is for parallel transfers of multiple files, in case of pipe there is only one transfer, so this parameter wouldn’t do something useful.
Please try to use -p 2 instead, if it’s succeed, you may try to use -p 3
By the way v1.91.2 should upload and download faster, however we likely need to find an ideal options for your case anyway, to be still fast but not overload your network stack.

andy01 · November 23, 2023, 4:00pm

Running uplink v1.90.2 with -t 4 only - completed successfully.

Will try these options as well. Thanks.

I am still not sure the issue is on my end.
Are there any tools that can test this? I.e. if I could run something with the large amount of connections to see there were no issues.
I am usually using the iperf3 tool between two servers physically connected to the single MikroTik switch/router we have, the results were (see image below).

There were some retransmissions (Retr) which could happen due to minor packet loss which is not uncommon in TCP/IP networks. And the number is relatively small given the high volume of data transferred.
I’d expect the uplink to attempt retransmit the data instead of just failing. If that’s this, is it possible to implement?

Both blades are running at 10 Gbps network speed.

Update

I think it is worth adding some hints to the uplink CLI’s --help section, to highlight which arguments do not apply in case of stdin upload/download, e.g.:

scroll to the most right

$ uplink --interactive=false cp --help --advanced
...
    -t, --transfers int                      Controls how many uploads/downloads to perform in parallel (default 1) (defaults to 1 when piped)
...

Or something among these lines.

Alexey · November 24, 2023, 4:32am

uplink doesn’t retransmit failed pieces, it requests more nodes than needed (130) for the each segment (64MiB or less) to start an upload and cancel all remained when the first 80 are finished. This is of course produces a lot of connections, when you transfer 10 segments in parallel.
If you want a retry feature, then you can use rclone instead, it has this feature already. For pipe you would use

cat ~/file.sql | gzip - | rclone rcat --streaming-upload-cutoff 64Mi storj:my-bucket/my-prefix/my-file.sql.gz

it also can be configured to use multiple threads too (--multi-thread-streams 4 by default).

Did you try

and -p 4?

andy01 · November 24, 2023, 9:06pm

Alexey:

If you want a retry feature, then you can use rclone instead, it has this feature already. For pipe you would use
cat ~/file.sql | gzip - | rclone rcat --streaming-upload-cutoff 64Mi storj:my-bucket/my-prefix/my-file.sql.gz
it also can be configured to use multiple threads too (--multi-thread-streams 4 by default).

Thank you! I’ll consider switching to rclone

The last time I’ve tried -p 4 -t 4 --maximum-concurrent-segments=5 --maximum-concurrent-pieces=100 --progress=false => it failed after 8h26m.

The backup successfully completes with -p 1, -p 2 alone.

I am currently trying the -p 3 and then -p 4 alone.
If either fails, I’ll lower the --maximum-concurrent-segments 1 --maximum-concurrent-pieces 100 as suggested while keeping either -p 3 or -p 4 depending on which one fails (and if it fails at all ). (( Might keep increasing it to -p 5, -p 6, and so forth until I break it, so to give the ``--maximum-concurrent-... flags suggested a shot ))

andy01 · December 8, 2023, 12:38pm

andy01:

Alexey:
andy01:

I’d expect the uplink to attempt retransmit the data instead of just failing. If that’s this, is it possible to implement?

uplink doesn’t retransmit failed pieces, it requests more nodes than needed (130) for the each segment (64MiB or less) to start an upload and cancel all remained when the first 80 are finished. This is of course produces a lot of connections, when you transfer 10 segments in parallel.
If you want a retry feature, then you can use rclone instead, it has this feature already. For pipe you would use
cat ~/file.sql | gzip - | rclone rcat --streaming-upload-cutoff 64Mi storj:my-bucket/my-prefix/my-file.sql.gz
it also can be configured to use multiple threads too (--multi-thread-streams 4 by default).

Did you try

kaloyan:

--maximum-concurrent-segments 1 --maximum-concurrent-pieces 100

and -p 4?
The last time I’ve tried -p 4 -t 4 --maximum-concurrent-segments=5 --maximum-concurrent-pieces=100 --progress=false => it failed after 8h26m.

The backup successfully completes with -p 1, -p 2 alone.

I am currently trying the -p 3 and then -p 4 alone.
If either fails, I’ll lower the --maximum-concurrent-segments 1 --maximum-concurrent-pieces 100 as suggested while keeping either -p 3 or -p 4 depending on which one fails (and if it fails at all ). (( Might keep increasing it to -p 5, -p 6, and so forth until I break it, so to give the ``--maximum-concurrent-... flags suggested a shot ))

A small and probably final update for now:

the backup successfully completed with -p 1,2,3 and -p 4 alone (without -t and any other flags)

Just for the record: it did fail when with -p 6, -p 8:

-p 8 - failed

root-node-1  |  272GiB 1:08:39 [88.0MiB/s] [67.7MiB/s] [>                    ]  7% ETA 15:06:49
root-node-1  |  272GiB 1:08:44 [ 106MiB/s] [67.8MiB/s] [>                    ]  7% ETA 15:06:03
root-node-1  | failed to upload part 265: uplink: encryption: metaclient: manager closed: closed: read tcp 172.19.0.3:38398->34.150.199.48:7777: read: connection reset by peer
root-node-1  | failed to upload part 266: uplink: failed to upload enough pieces (needed at least 80 but got 68)
root-node-1  | failed to upload part 269: context canceled
root-node-1  | failed to upload part 271: uplink: encryption: context canceled
root-node-1  | failed to upload part 267: uplink: failed to upload enough pieces (needed at least 80 but got 63)
root-node-1  | failed to upload part 272: uplink: encryption: context canceled
root-node-1  | failed to upload part 268: uplink: encryption: context canceled
root-node-1  | failed to upload part 270: uplink: encryption: context canceled
root-node-1  |  273GiB 1:08:56 [4.34MiB/s] [67.6MiB/s] [>                    ]  7% ETA 15:08:31
root-node-1  |  273GiB 1:08:56 [67.6MiB/s] [67.6MiB/s] [>                    ]  7%

-p 6 - failed

root-node-1  | 1.62TiB 6:38:40 [ 121MiB/s] [71.0MiB/s] [========>             ] 42% ETA 8:51:06
root-node-1  | 1.62TiB 6:38:45 [79.9MiB/s] [71.0MiB/s] [========>             ] 42% ETA 8:51:00
root-node-1  | failed to upload part 1594: uplink: encryption: metaclient: write tcp 172.19.0.3:40052->34.172.100.72:7777: write: connection timed out; write tcp 172.19.0.3:40052->34.172.100.72:7777: write: connection timed out
root-node-1  | failed to upload part 1658: context canceled
root-node-1  | error getting reader for part 1660: context canceled
root-node-1  | failed to upload part 1654: uplink: failed to upload enough pieces (needed at least 80 but got 76)
root-node-1  | failed to upload part 1657: uplink: encryption: context canceled
root-node-1  | failed to upload part 1659: uplink: context canceled

Final decision for now: I’ll stick the default config to -p 4 and remove -t 4.

Thank you @Alexey & @kaloyan for all your help provided!

Alexey · December 9, 2023, 2:20am

Thank you for the update!
Is the speed remain the same when you use -p 4 as was in a previous version of uplink?

pwilloughby · December 12, 2023, 8:07am

If you’re interested in pursing this further take a look at the connection tracking on your router. It’s likely you’ve been exceeding the maximum entries the table can hold when you use the higher concurrency settings.

Uplink: failed to upload enough pieces (needed at least 80 but got 78)

uplink v1.89.5

uplink v1.90.2

Update

Update 2

Update: uplink v1.90.2 with -p 4 only

WIP: uplink v1.90.2 with -t 4 only WIP

Update

uplink `v1.89.5`

uplink `v1.90.2`

Update: uplink `v1.90.2` with `-p 4` only

WIP: uplink `v1.90.2` with `-t 4` only WIP