Uploads via S3 gateway; successful puts less than success threshold

Just received my Tardigrade invite. Signed up on us-central. Got API key. Configured gateway_linux_amd64 according to docs. Attempting to upload using aws s3 sync a folder with 1758 files, total size 35MB.

Seeing these on the client:

2020/01/02 18:02:04 ERROR : myfile.txt: Failed to copy: s3 upload: 500 Internal Server Error: <?xml version="1.0" encoding="UTF-8"?>
    <Error><Code>InternalError</Code><Message>We encountered an internal error, please try again.</Message><Key></Key><BucketName></BucketName><Resource>/cherking/myfile.txt</Resource><RequestId>3L137</RequestId><HostId>3L137</HostId></Error>

And seeing this on the output of gateway:

    2020-01-02T17:58:20.906Z        ERROR   gateway error:  {"error": "segment error: ecclient error:
 successful puts (61) less than success threshold (80)", "errorVerbose": "segment error: ecclient error:
 successful puts (61) less than success threshold (80)\n\tstorj.io/storj/uplink/ecclient.
(*ecClient).Put:176\n\tstorj.io/storj/uplink/storage/segments.
(*segmentStore).Put:74\n\tstorj.io/storj/uplink/storage/streams.
(*streamStore).upload:257\n\tstorj.io/storj/uplink/storage/streams.
(*streamStore).Put:113\n\tstorj.io/storj/uplink/storage/streams.
(*shimStore).Put:50\n\tstorj.io/storj/uplink/stream.NewUpload.func1:52\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

The error indicates that at the time you attempted this folder sync, there were not enough nodes available to assure that the file would have enough redundancy of pieces (upload threshold is currently 80 nodes). Have you tried this several times and you are still seeing this same issue? We already have a dev looking into the problem now. Perhaps trying to synch all 1758 files in the folder at the same time is a bit much, could you try with a folder that has a smaller number of files in it?

Please check the version of your gateway. I bet you are running an old version that is using grpc. The storage nodes will accept only 7 concurrent grpc uploads. drpc is unlimited and blocking way more than 7 connections.

Updating to the latest gateway version should fix it.

2 Likes

I downloaded gateway from here, about 30m before posting this:
https://github.com/storj/storj/releases/latest/download/gateway_linux_amd64.zip

I’m actually using rclone to sync this directory. Rclone uses S3 protocol and is limiting me to only 4 simultaneous uploads at a time.
I’m not sure if it is rclone or the gateway or tardigrade in general, but in the past 30m, I’ve only managed to upload 4.4MB of data. It’s giving me an estimate of 2.3hrs to upload 35MB. That’s not very good.

I have seen that our own developers download it but are still running a different version. Please check the output of the gateway binary you are executing. gateway version should print it out.

# ./gateway_linux_amd64 version
2020-01-02T18:42:47.871Z	INFO	Configuration loaded from: /root/.local/share/storj/gateway/config.yaml
Release build
Version: v0.28.4
Build timestamp: 01 Jan 20 11:54 UTC
Git commit: 5c25d3d6d31dde11405aa28281d37c1e4338e1de

Great. Thank you for confirming it.

Now please run your gateway with log level debug. You can do that via config file or by adding --log.level debug
This should hopefully print out the error message the storage nodes are returning. I place my second bet on clock out of sync.

I’ve switched to the official aws cli, just to rule out anything rclone may . be doing. Getting even slower results. In debug logging of the . gateway, seeing these

2020-01-02T18:48:26.710Z	DEBUG	ecclient	Upload to storage node failed	{"Node ID": "143BFywMhDey8ShCU1pAhpVF4FQKe6fbamnSjPQhfjCMYHTd6o", "error": "protocol: expected piece hash; storage node overloaded", "errorVerbose": "group:\n--- protocol: expected piece hash\n\tstorj.io/storj/uplink/piecestore.(*Upload).Commit:229\n\tstorj.io/storj/uplink/piecestore.(*BufferedUpload).Commit:45\n\tstorj.io/storj/uplink/piecestore.(*LockingUpload).Commit:105\n\tstorj.io/storj/uplink/ecclient.(*ecClient).PutPiece.func3:236\n\tstorj.io/storj/uplink/ecclient.(*ecClient).PutPiece:266\n\tstorj.io/storj/uplink/ecclient.(*ecClient).Put.func1:112\n--- storage node overloaded"}


2020-01-02T18:48:29.778Z	DEBUG	ecclient	Failed dialing for putting piece to node	{"Piece ID": "ZMPYXKZQN4EPSIMKCPO4IJEILYVOOSAHJ5VDQW2YKVDOK2N5OSIQ", "Node ID": "12MzeTyyZKxFzdUAy5FaoBJGeXW5ucUmY3DugAncsZS1F1PLL64", "error": "piecestore: rpccompat: dial tcp: lookup stadsbygd.ddns.net: Try again", "errorVerbose": "piecestore: rpccompat: dial tcp: lookup stadsbygd.ddns.net: Try again\n\tstorj.io/storj/pkg/rpc.Dialer.dialTransport:54\n\tstorj.io/storj/pkg/rpc.Dialer.dial:31\n\tstorj.io/storj/pkg/rpc.Dialer.DialNode:124\n\tstorj.io/storj/uplink/piecestore.Dial:51\n\tstorj.io/storj/uplink/ecclient.(*ecClient).dialPiecestore:67\n\tstorj.io/storj/uplink/ecclient.(*ecClient).PutPiece:197\n\tstorj.io/storj/uplink/ecclient.(*ecClient).Put.func1:112"}

2020-01-02T18:49:29.810Z	DEBUG	ecclient	Failed dialing for putting piece to node	{"Piece ID": "UUIPODGMY6FFYPSK2C373TYLSW7XCOG6LCA2SQJNJPKCLOWXICGA", "Node ID": "12QFiGDkFVJ2eqFNpyxdyLatq2jgACezCXZtMQLd7QXTVUv5yhK", "error": "piecestore: rpccompat: dial tcp: lookup hurststorj.ddns.net: Try again", "errorVerbose": "piecestore: rpccompat: dial tcp: lookup hurststorj.ddns.net: Try again\n\tstorj.io/storj/pkg/rpc.Dialer.dialTransport:54\n\tstorj.io/storj/pkg/rpc.Dialer.dial:31\n\tstorj.io/storj/pkg/rpc.Dialer.DialNode:124\n\tstorj.io/storj/uplink/piecestore.Dial:51\n\tstorj.io/storj/uplink/ecclient.(*ecClient).dialPiecestore:67\n\tstorj.io/storj/uplink/ecclient.(*ecClient).PutPiece:197\n\tstorj.io/storj/uplink/ecclient.(*ecClient).Put.func1:112"}

2020-01-02T18:50:54.855Z DEBUG ecclient Upload to storage node failed {“Node ID”: “12mfQqhAPY5PjBMLf5JC8dBPzfVVyD7xEZ8vED49GXGsGarwu1U”, “error”: “protocol: expected piece hash; serial number is already used: usedserialsdb error: database disk image is malformed\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doUpload:319\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Upload:268\n\tstorj.io/storj/pkg/pb.DRPCPiecestoreDescription.Method.func1:1064\n\tstorj.io/drpc/drpcserver.(*Server).doHandle:175\n\tstorj.io/drpc/drpcserver.(*Server).HandleRPC:153\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:114\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:147\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51”, “errorVerbose”: “group:\n— protocol: expected piece hash\n\tstorj.io/storj/uplink/piecestore.(*Upload).Commit:229\n\tstorj.io/storj/uplink/piecestore.(*BufferedUpload).Commit:45\n\tstorj.io/storj/uplink/piecestore.(*LockingUpload).Commit:105\n\tstorj.io/storj/uplink/ecclient.(*ecClient).PutPiece.func3:236\n\tstorj.io/storj/uplink/ecclient.(*ecClient).PutPiece:266\n\tstorj.io/storj/uplink/ecclient.(*ecClient).Put.func1:112\n— serial number is already used: usedserialsdb error: database disk image is malformed\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doUpload:319\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Upload:268\n\tstorj.io/storj/pkg/pb.DRPCPiecestoreDescription.Method.func1:1064\n\tstorj.io/drpc/drpcserver.(*Server).doHandle:175\n\tstorj.io/drpc/drpcserver.(*Server).HandleRPC:153\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:114\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:147\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51”}

Tried uploading a single large file 300MB. Failed. Various errors all throughout the gateway debug logs. Here are some:

2020-01-02T20:37:42.303Z ERROR gateway error: {“error”: “Storj Gateway error: pending upload “Upload3” missing”, “errorVerbose”: “Storj Gateway error: pending upload “Upload3” missing\n\tstorj.io/storj/pkg/miniogw.(*MultipartUploads).Remove:230\n\tstorj.io/storj/pkg/miniogw.(*gatewayLayer).AbortMultipartUpload:96\n\tstorj.io/storj/pkg/miniogw.(*layerLogging).AbortMultipartUpload:143\n\tgithub.com/minio/minio/cmd.objectAPIHandlers.AbortMultipartUploadHandler:1297\n\tnet/http.HandlerFunc.ServeHTTP:2007\n\tgithub.com/gorilla/mux.(*Router).ServeHTTP:212\n\tgithub.com/minio/minio/cmd.securityHeaderHandler.ServeHTTP:657\n\tgithub.com/minio/minio/cmd.rateLimit.ServeHTTP:642\n\tgithub.com/minio/minio/cmd.pathValidityHandler.ServeHTTP:602\n\tgithub.com/minio/minio/cmd.httpStatsHandler.ServeHTTP:541\n\tgithub.com/minio/minio/cmd.requestSizeLimitHandler.ServeHTTP:66\n\tgithub.com/minio/minio/cmd.requestHeaderSizeLimitHandler.ServeHTTP:91\n\tgithub.com/minio/minio/cmd.crossDomainPolicy.ServeHTTP:51\n\tgithub.com/minio/minio/cmd.redirectHandler.ServeHTTP:243\n\tgithub.com/minio/minio/cmd.minioReservedBucketHandler.ServeHTTP:301\n\tgithub.com/minio/minio/cmd.cacheControlHandler.ServeHTTP:270\n\tgithub.com/minio/minio/cmd.timeValidityHandler.ServeHTTP:371\n\tgithub.com/rs/cors.(*Cors).Handler.func1:200\n\tnet/http.HandlerFunc.ServeHTTP:2007\n\tgithub.com/minio/minio/cmd.resourceHandler.ServeHTTP:492\n\tgithub.com/minio/minio/cmd.authHandler.ServeHTTP:300\n\tgithub.com/minio/minio/cmd.reservedMetadataHandler.ServeHTTP:134\n\tgithub.com/minio/minio/cmd/http.(*Server).Start.func1:108\n\tnet/http.HandlerFunc.ServeHTTP:2007\n\tnet/http.serverHandler.ServeHTTP:2802\n\tnet/http.(*conn).serve:1890”}

2020-01-02T19:13:49.853Z DEBUG ecclient Upload to storage node failed {“Node ID”: “1T9dRgsaydcSXCg3R4FD77cPovnMXqkRVYEmxtnYKYy1VzDbGR”, “error”: “protocol: expected piece hash; context canceled”, “errorVerbose”: “group:\n— protocol: expected piece hash\n\tstorj.io/storj/uplink/piecestore.(*Upload).Commit:229\n\tstorj.io/storj/uplink/piecestore.(*BufferedUpload).Commit:45\n\tstorj.io/storj/uplink/piecestore.(*LockingUpload).Commit:105\n\tstorj.io/storj/uplink/ecclient.(*ecClient).PutPiece.func3:236\n\tstorj.io/storj/uplink/ecclient.(*ecClient).PutPiece:266\n\tstorj.io/storj/uplink/ecclient.(*ecClient).Put.func1:112\n— context canceled”}

2020-01-02T19:07:21.786Z DEBUG ecclient Upload to storage node failed {“Node ID”: “12mcDKykUvpULJf7ApAx8UNM9mKE4Uz3z59jVoe4bRymxEDkfYn”, “error”: “piecestore: rpccompat: context canceled”, “errorVerbose”: “piecestore: rpccompat: context canceled\n\tstorj.io/storj/pkg/rpc.Dialer.dialTransport:74\n\tstorj.io/storj/pkg/rpc.Dialer.dial:31\n\tstorj.io/storj/pkg/rpc.Dialer.DialNode:124\n\tstorj.io/storj/uplink/piecestore.Dial:51\n\tstorj.io/storj/uplink/ecclient.(*ecClient).dialPiecestore:67\n\tstorj.io/storj/uplink/ecclient.(*ecClient).PutPiece:197\n\tstorj.io/storj/uplink/ecclient.(*ecClient).Put.func1:112”}

@utdrmac are you still having this issue?

How ist the sync command looks like?

Ok got it

aws s3 --endpoint=http://localhost:7777/ sync directory s3://bucket-name

Have not tried since originally posting. Was hoping to get some replies/answers from dev team before trying again.

Ok let me boost the signal on it ; it is the weekend so it may take a little while to hear anything, but hang in there :slight_smile:

Failed DNS resolutions like this make me think there’s some larger issue. I’ll put a ticket into the system to try to categorize what different errors we can get from these Dynamic DNS providers under various circumstance.

dial tcp: lookup hurststorj.ddns.net: Try again

These context cancelled issues also feel like simple “bad/slow connection” type of noise. They’re essentially connection timeouts.

“piecestore: rpccompat: context canceled”, “errorVerbose”: "piecestore: rpccompat: context canceled

The following may indicate a programming issue on our part, but it seems to result from an already aborted upload:

“Storj Gateway error: pending upload “Upload3” missing

I haven’t talked to the rest of the team yet, but I will take a close look at this issue:

"protocol: expected piece hash; serial number is already used: usedserialsdb error: database disk image is malformed

Overall, yeah, there’s something definitely wrong with your setup. Would it be possible to try it on another computer? Or perhaps if you’re on wifi, to try a hard-wired connection? I know these are thin “did you try turning it off and turning it back on” words of advice, but right now the arrows are mostly pointing at “bad internet.” :confused:

2 Likes

Hey @wthorp Everything is hardwired already and the DNS resolver is the same in use by everyone else on this network and I have not received any complaints/tickets. My laptop (also hardwired) also uses same and I’ve got no issues. I’ll try running the gateway and try some uploads on my laptop. Can’t imagine what is wrong with “my setup” since the only thing running is the gateway.