Significant issues with file retrieval

will.topping · January 31, 2020, 1:28pm

Since yesterday ive almost 100% been able to retrieve anything i upload. Im using the latest version of the gateway with Cloudberry Explorer. Also tried S3 Explorer today and experiencing the same issues…

Also seeing some errors in the gateway too but not all the time. Perhaps more concerning is that the files appear to upload correctly with no errors. Screenshots below:

nerdatwork · January 31, 2020, 1:37pm

Are you sure you are ok displaying access key & secret key ?

BrightSilence · January 31, 2020, 1:49pm

Those errors say that not enough uploads were successful to store the piece. So the upload didn’t finish correctly. Are you using an older bucket? I think buckets were created with low RS settings on older versions, making these issues much more prevalent. If you create a new bucket in the latest version it should use the current RS settings. This may fix your problem.

Though I thought this update would also overwrite the bad RS settings for older buckets. Apparently not.

will.topping · January 31, 2020, 1:50pm

its only a test machine with test data so im ok with it

will.topping · January 31, 2020, 1:51pm

ive tried everything, uploading to old buckets but the screenshots were from a brand new bucket.

I also don’t seem to get the upload errors in the gateway all the time yet the file is usually unavailable.

Cmdrd · January 31, 2020, 1:55pm

That’s interesting because my storage node upload failure rates are through the roof since the latest update. I was around 86% successful before this update and now I’m down to almost 30% successful uploads. Wonder if there is some correlation.

will.topping · January 31, 2020, 1:56pm

it would be good if someone else can try uploading via the gateway and then retrieving a file.

I havent tried using the uplink…

im also on the Europe satellite.

BrightSilence · January 31, 2020, 2:05pm

I assume you mean on the storagenode side?
Yes, there is a correlation. I believe before the update 95 uploads were started and after 80 were done the remaining 15 were cancelled. Now 110 are started and 30 of them cancelled. So that’s already a doubling of the context cancelled amount. But for some nodes the impact may be much higher. If you consistently came in as around the 20th to last node to finish the transfer, you would have succeeded with the previous settings but you fail with the new settings.

Have you tried skipping the third party clients and just using the AWS CLI? Do you see the same issues that way?

will.topping · January 31, 2020, 2:13pm

no idea how to do that but it has been working with the clients fine previously.

BrightSilence · January 31, 2020, 2:15pm

https://documentation.tardigrade.io/api-reference/s3-gateway#reconfiguring-the-aws-cli-to-interface-with-the-tardigrade-network

The AWS CLI may also give you some more verbose feedback.

Cmdrd · January 31, 2020, 7:14pm

That makes sense, I did not think about that change making this impact. And yes it is on the storage node side. Figured there wasn’t so much a problem as a change at some point resulting in this, so thanks for the FYI!