An error occurred (InternalError) when calling the CompleteMultipartUpload operation

Running “aws s3 cp” from local file to sj://something and I’m getting:

An error occurred (InternalError) when calling the CompleteMultipartUpload operation (reached max retries: 4): We encountered an internal error, please try again.

After this was shown for a while:

Completed 16.6 MiB/16.6 MiB (908.2 KiB/s) with 1 file(s) remaining 

The uploaded file seems to be there, and it appears to be intact. The upload is very fast for the first 14.6 MB, it stops there for a few seconds, then jumps to the line above and takes quite some time there.

This is 0.31.12 on Linux, against the US satellite, just now. It is about 80% reproduceable, works in 20% or so of the cases.

If I use the uplink tool rather than going through the aws tools, I get:

16.58 MiB / 16.58 MiB [------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------] 100.00% 138.11 KiB p/s
Error: metainfo error: unable to commit object

Can you try issuing the upload with debug logging and provide the output? For example: uplink --log.level=debug cp file sj://something

Does doing uplink rm sj://something fix the problem?

Right now (1-2h later) it seems to complete without error, but the timing is still the same. With the debug-level logging, I get a bunch of things like those:

Lots of those for different nodes:

2020-02-06T22:54:27.780Z	DEBUG	ecclient	Upload to storage node failed	{"Node ID": "12hF5VnLySteRJjgz8yFK6TqmJCsxDheUugsfTevZMaPyAXuRSK", "error": "protocol: context canceled; context canceled; piecestore: context canceled", "errorVerbose": "group:\n--- protocol: context canceled; context canceled\n\tstorj.io/uplink/piecestore.(*Upload).Write:158\n\tbufio.(*Writer).Flush:593\n\tstorj.io/uplink/piecestore.(*BufferedUpload).Commit:44\n\tstorj.io/uplink/piecestore.(*LockingUpload).Commit:105\n\tstorj.io/uplink/ecclient.(*ecClient).PutPiece.func3:237\n\tstorj.io/uplink/ecclient.(*ecClient).PutPiece:267\n\tstorj.io/uplink/ecclient.(*ecClient).Put.func1:113\n--- piecestore: context canceled\n\tstorj.io/uplink/piecestore.(*Upload).Commit:200\n\tstorj.io/uplink/piecestore.(*BufferedUpload).Commit:45\n\tstorj.io/uplink/piecestore.(*LockingUpload).Commit:105\n\tstorj.io/uplink/ecclient.(*ecClient).PutPiece.func3:237\n\tstorj.io/uplink/ecclient.(*ecClient).PutPiece:267\n\tstorj.io/uplink/ecclient.(*ecClient).Put.func1:113"}
2020-02-06T22:54:27.782Z	DEBUG	ecclient	Upload to storage node failed	{"Node ID": "12Yj6yZNQuuHSFy4CVabwp7hiRyXvPGeKvzRUMrrfFvRjSw7gTH", "error": "protocol: expected piece hash; context canceled", "errorVerbose": "group:\n--- protocol: expected piece hash\n\tstorj.io/uplink/piecestore.(*Upload).Commit:229\n\tstorj.io/uplink/piecestore.(*BufferedUpload).Commit:45\n\tstorj.io/uplink/piecestore.(*LockingUpload).Commit:105\n\tstorj.io/uplink/ecclient.(*ecClient).PutPiece.func3:237\n\tstorj.io/uplink/ecclient.(*ecClient).PutPiece:267\n\tstorj.io/uplink/ecclient.(*ecClient).Put.func1:113\n--- context canceled"}

and some of those:

2020-02-06T22:54:27.788Z	DEBUG	ecclient	Upload to storage node failed	{"Node ID": "1a6UoNK9zfRAqZjmHAkdA49EL8ia2bcuat9QwkqNSiTgb49tKT", "error": "piecestore: rpccompat: context canceled", "errorVerbose": "piecestore: rpccompat: context canceled\n\tstorj.io/common/rpc.Dialer.dialTransport:242\n\tstorj.io/common/rpc.Dialer.dial:219\n\tstorj.io/common/rpc.Dialer.DialNode:127\n\tstorj.io/uplink/piecestore.Dial:51\n\tstorj.io/uplink/ecclient.(*ecClient).dialPiecestore:68\n\tstorj.io/uplink/ecclient.(*ecClient).PutPiece:198\n\tstorj.io/uplink/ecclient.(*ecClient).Put.func1:113"}
2020-02-06T22:53:00.344Z|DEBUG|ecclient|Upload to storage node failed|{"Node ID": "1Qa1U74ybEt5fAJSorqzcDfZKheqAvoBhzDz9JnsTgWq8ZMopM", "error": "piecestore: rpccompat: dial tcp: lookup dogpound.hopto.org: Try again", "errorVerbose": "piecestore: rpccompat: dial tcp: lookup dogpound.hopto.org: Try again\n\tstorj.io/common/rpc.Dialer.dialTransport:242\n\tstorj.io/common/rpc.Dialer.dial:219\n\tstorj.io/common/rpc.Dialer.DialNode:127\n\tstorj.io/uplink/piecestore.Dial:51\n\tstorj.io/uplink/ecclient.(*ecClient).dialPiecestore:68\n\tstorj.io/uplink/ecclient.(*ecClient).PutPiece:198\n\tstorj.io/uplink/ecclient.(*ecClient).Put.func1:113"}|
2020-02-06T22:53:19.264Z|DEBUG|ecclient|Upload to storage node failed|{"Node ID": "12ehBFtwtcAB5TZZf5nwn29qVcBZfjGzax8YR2WNR8ErRHzeLg8", "error": "protocol: out of space", "errorVerbose": "protocol: out of space\n\tstorj.io/uplink/piecestore.(*Upload).Write:160\n\tbufio.(*Writer).Flush:593\n\tbufio.(*Writer).Write:629\n\tstorj.io/uplink/piecestore.(*BufferedUpload).Write:32\n\tstorj.io/uplink/piecestore.(*LockingUpload).Write:89\n\tio.copyBuffer:404\n\tio.Copy:364\n\tstorj.io/common/sync2.Copy:22\n\tstorj.io/uplink/ecclient.(*ecClient).PutPiece:240\n\tstorj.io/uplink/ecclient.(*ecClient).Put.func1:113"}|
2020-02-06T22:54:27.777Z|DEBUG|ecclient|Node cut from upload due to slow connection|{"Node ID": "12wAitZrj3u3sk9thMCUTAurSFW3CHGTQEGaQSVWTMeqfEYy6nW"}|
2020-02-06T22:54:27.777Z|DEBUG|ecclient|Failed dialing for putting piece to node|{"Piece ID": "SY36UV6HK2AECCQZ2PQQZ6W4QVB32C6U4YB2BWTWIFDU2KCV3IXQ", "Node ID": "1a6UoNK9zfRAqZjmHAkdA49EL8ia2bcuat9QwkqNSiTgb49tKT", "error": "piecestore: rpccompat: context canceled", "errorVerbose": "piecestore: rpccompat: context canceled\n\tstorj.io/common/rpc.Dialer.dialTransport:242\n\tstorj.io/common/rpc.Dialer.dial:219\n\tstorj.io/common/rpc.Dialer.DialNode:127\n\tstorj.io/uplink/piecestore.Dial:51\n\tstorj.io/uplink/ecclient.(*ecClient).dialPiecestore:68\n\tstorj.io/uplink/ecclient.(*ecClient).PutPiece:198\n\tstorj.io/uplink/ecclient.(*ecClient).Put.func1:113"}
2020-02-06T22:54:27.779Z|DEBUG|ecclient|Node cut from upload due to slow connection|{"Node ID": "129eZWydKszpJkzqCPo2b8QxcknQJspQSAtaS2H6sJqqfXeQXyU"}|