Error - copying files

I’m in the process of copying my buckets from another S3 storage to Storj.

Some copies go smoothly, with thousands of files being copied without a single error.

However, in some of these copy processes (like the one I’m running at the moment) errors repeatedly occur. In this case, there are about 2 thousand files to be copied, and every few dozen files copied the process is interrupted with this error:

Failed to upload the file [redacted]: InternalError: We encountered an internal error, please try again.: cause(uplink: stream: metaclient: metabase: object already exists)
        status code: 500, request id: 16FE0A34025FD8A6, host id:

(The error occurs with different files on each try)

Any idea?

The performance of successive copy runs gets worse until it becomes impractical, with only a few files copied per run:

Files to copy: 2295, to skip: 23302, total: 25597

Copied file [redacted] (1/2295) 1.12MB/s 02:35:02 0.0%
Copied file [redacted] (2/2295) 2.64MB/s 01:30:46 0.1%
Copied file [redacted] (3/2295) 466KB/s 00:36:04 0.1%

Failed to upload the file [redacted]: InternalError: We encountered an internal error, please try again.: cause(uplink: stream: metaclient: metabase: object already exists)
        status code: 500, request id: 16FE187C3B471588, host id:

The other times it happened, with other buckets, I usually - not always - managed to solve it only when I completely deleted the bucket itself and recreated it.

But there must be a better way to fix this error, once the root cause is identified…

What S3 client are you using?

Have you tried rclone? Then you can use the S3 protocol for the source and the native Storj protocol for the destination.

The files are chunks of backups generated by Duplicacy, so I have to use Duplicacy itself to do the copy, I can’t use Cyberduck or something like that.

I can’t use Rclone because the chunks change during copying. Duplicacy reads the source chunk, decrypts it, encrypts it with the new storage (DCS) key, and uploads it.

(There is an option used in the creation of storage by Duplicacy that would make the chunks not need to be decrypted in the copy, and so I could use Rclone, but as I’m changing the encryption key, I cannot use this option.)

Could you share what command you’re running to encounter this problem with Duplicacy?

I noted there are -debug and -verbose flags which might help to see what requests it may be making to S3 gateways at the time it encounters an error.

Sure, the command is very simple:

duplicacy -v copy -key private-key.pem -key-passphrase ***** -from [wasabi storage] -to [storj storage]

The above outputs were generated with -v (verbose) option.

Running with -d (debug) doesn’t seem to bring up any more useful information:

Copied chunk 47e8ee55df7a77ee774c14649a6d5c3556994fcd75032d8d3bdf4c387cf4a78c (11/1372) 1.74MB/s 00:14:06 0.9%
Chunk 6afb893599652f63efe3d6e7795bb52b0243b4baf7c2782d4ae5adda7cc18a59 has been downloaded
Copying chunk cb0b92745c6f37fc1a2b5c77021c8f02de8273999979fe3dc286c91e24934563 to 338af2774a32eeb6216a267fbd334a27920688d58a4b2130010803d35ac6e9e6
Fetching chunk cb0b92745c6f37fc1a2b5c77021c8f02de8273999979fe3dc286c91e24934563
Chunk 20ba4c47770d08fb5e901f7856de5ec79a3b48ba7cffe76f3ebc790ee417eba2 has been uploaded
Copied chunk 20ba4c47770d08fb5e901f7856de5ec79a3b48ba7cffe76f3ebc790ee417eba2 (12/1372) 3.63MB/s 01:31:15 0.1%
Chunk cb0b92745c6f37fc1a2b5c77021c8f02de8273999979fe3dc286c91e24934563 has been downloaded
Copying chunk 0d37a33ee4ab19e4d90fabe61f021822da58d3458217d6f822ceb6bf4002c2bc to 48eb2fa456829222ff983a5158ad42e1059e6e672d19f13ffc68874edfa7e610
Fetching chunk 0d37a33ee4ab19e4d90fabe61f021822da58d3458217d6f822ceb6bf4002c2bc
Chunk 0d37a33ee4ab19e4d90fabe61f021822da58d3458217d6f822ceb6bf4002c2bc has been downloaded
Copying chunk ecfecf905657f82d5e38f2bc75185261796e06076090345d4288b5b1457acb61 to f1889ec3c197d49d4d52451991725b19d276f9fa10765d5c44b6a1392116caae
Fetching chunk ecfecf905657f82d5e38f2bc75185261796e06076090345d4288b5b1457acb61
Chunk ecfecf905657f82d5e38f2bc75185261796e06076090345d4288b5b1457acb61 has been downloaded
Copying chunk 5de570b59a9365b49694d1bc1f4ce0afaf424ca39eae9898ab59db5b838bbc71 to a3c743c96dc577240bed61805018cf34c27b6f1d27d841f0464cc4eea1654134
Fetching chunk 5de570b59a9365b49694d1bc1f4ce0afaf424ca39eae9898ab59db5b838bbc71

Failed to upload the chunk 9fc346a516e7fa08f1df7cd81df7f5b9a8f9c57455ba671adaac3fccb8241ac6: InternalError: We encountered an internal error, please try again.: cause(uplink: stream: metaclient: metabase: object already exists)
        status code: 500, request id: 16FEEC4FBF6E15EE, host id:

Are you sure that multiple files dont share the same name ? This could cause this issue

Source storage doesn’t allow duplicate names, but I tested it with Rclone anyway:

rclone dedupe wasabi:[bucket]
2022/07/05 17:05:28 NOTICE: S3 bucket [bucket name]: Can't have duplicate names here. Perhaps you wanted --by-hash ? Continuing anyway.

Thanks! One case we discovered is that when new uploads are created concurrently for the same object key sometimes the “object already exists” error is thrown. I was only able to replicate this synthetically by creating many uploads with the same key in parallel. We are investigating how and why some S3 clients send requests this way that can lead to this error.

I noticed there’s a -threads option to Duplicacy. I’m not sure what it defaults to, but does setting it to 1 help you at all for the moment?

Yes, that was one of the first options I thought of, and even with -threads 1 the error occurs

It defaults to 4.

Edit: complementing, it seems that using threads 1 option, the copy advances a little further, something like a dozen files copied, in the several attempts I made, but it is not something completely determined.

Another point to consider is that in some other buckets, even larger ones and with more files, I was able to copy without any problem. I couldn’t identify any characteristic that would determine this (success or failure), they are all buckets with Duplicacy backup files.

The other times this error occurred, the only solution I found was to completely delete the bucket, recreate it with the same name, and redo the copy. Sometimes it worked on the second try, sometimes on the third.

Well, I “reset” the bucket like i did with the other buckets where the same error occurred:

  • completely deleted the bucket
  • recreated the bucket with the same name (and same password, passphrase, etc)
  • rerun the copy

Result: over 25,000 files (exactly the same files as the previous attempt) copied without a single error…

Thank you for this! That will give me something to go on in trying to replicate the problem. Also, it’s great you were able to successfully finish the copy at least!

Once the copies were finished, I pointed the backup scripts to the new buckets.

After a few days running without problems, the same error occurred on a backup that resumed after being interrupted:

ERROR UPLOAD_CHUNK Failed to upload the chunk 61ab...e8ab: InternalError: We encountered an internal error, please try again.: cause(uplink: stream: metaclient: metabase: object already exists)

:frowning_face:

Could this perhaps have something to do with how asynchronous deletes are implemented? Like if a delete and upload for the same filename happened in quick succession?
Not sure if duplicacy would even have that behavior though, but it might if there is an update to a file.

I waited a day and looked for this file (chunk) in bucket in various ways (web interface, Rclone, Cyberduck) and… it’s not there… In other words, there’s no way to even manually delete it and resume the backup. The only way would be to completely erase the bucket and restart, which is obviously not practical.

It looks like only the metadata of the file is in a database (?) somewhere.

No, chunks are immutable once uploaded. They can only be deleted when a prune is performed. And Duplicacy only saves the backup snapshot (something like an index of the chunks) after all chunks from that snapshot have been uploaded. That is, chunks are not updated.

1 Like

One other thing: are you using s3 or s3c for the backend in Duplicacy?

s3://us-central-1@gateway.us1.storjshare.io/bucket

Have you tried using s3c:// instead? That backend uses different S3 client code in Duplicacy that has a delay between retrying requests, which may help you to avoid these errors.

2 Likes

We currently have an initiative “object consistency” which will allow for multiple pending objects under the same key which will get rid of this error.

2 Likes