S3 ListObject bug

daveM · June 17, 2021, 3:36pm

I am using the S3 gateway to access storj and when a ListObject (ListObjectsV2Request) request exceeds the key limit set (MaxKeys) and the response indicates the results were truncated, the next retrieval (using the NextContinuationToken) returns the same data. So, it will loop through the same keys forever since it returns the same data call after call (same keys and indicates the results were truncated). I have tested this with custom code on storj and other s3 compatible services. The other services do not have this issue, as well as using “ClouldBerry Explorer for Amazon” which results in the same issue, an infinite loop trying to retrieve the keys. The ListObject query is specifying a prefix if that helps. This happens whether I specify 25 or 1000 maximum keys.

Sorry, if this is not the place to submit this. I submitted this through support, but have not heard anything more. This is a pretty fundamental function of the storage system and without this working, it cannot be used. If it was just my code, I would think I have an issue with my code. But this works in Amazon S3 and other S3 compatible storage. It breaks using my code and CloudBerry’s explorer when attempting to list the content. This can cause an infinite loop attempting to retrieve the content since the IsTruncated flag will be true everytime when more results exist.

I have included an example below. Any help would be appreciated.

As an example, we store several hundred thousand to several million objects in a bucket with several thousand objects together in each folder.

The bucket is 20019

containing (different naming, but same depth of folders)

TEST/GROUP1/2021/0101/2021000001.TIF
TEST/GROUP1/2021/0101/2021000002.TIF
TEST/GROUP1/2021/0101/2021000004.TIF
TEST/GROUP1/2021/0101/2021000005.TIF
…
TEST/2021/0101/2021002640.TIF

When a call to list the files is made (specifying 100 maximum keys be returned), it would return the first 100 (set A) with IsTruncated=true indicating more objects exist. The next call is issued to retrieve the next 100 objects and works (set B). Then every subsequent call returns the same 100 (set B). This is infinite. I have tested with maxkeys as 25, 100, 1000, the result is the same.

Just to include a sample, the following is the simplified c# code.

public async Task<List<S3Object>> ListFiles(string s3bucketName, string s3Subdirectory = "")
{
  List<S3Object> files = new List<S3Object>();

  try
  {
    var listRequest = new ListObjectsV2Request
    {
      BucketName = s3bucketName,
      MaxKeys = 1000
    };

    if (!string.IsNullOrEmpty(s3Subdirectory))
    {
      if (s3Subdirectory[s3Subdirectory.Length - 1] != '/')
      {
        s3Subdirectory += "/";
      }

      listRequest.Prefix = s3Subdirectory.TrimStart('/');
    }

    ListObjectsV2Response response;
    do
    {
      response = await S3Client.ListObjectsV2Async(listRequest);

      if (null == response)
      {
        break;
      }

      files.AddRange(response.S3Objects);

      if (response.IsTruncated)
      {
        listRequest.ContinuationToken = response.NextContinuationToken;
      }
    } while (response.IsTruncated);
  }
  catch (Exception ex)
  {
    throw;
  }

  return files;
}

Alexey · June 17, 2021, 9:57pm

We have a jira issue for this bug and the team is working on it

artur · June 25, 2021, 4:10pm

Hi daveM!

Thank you so much for reporting this issue and providing so much detail. We fixed the problem with this change: https://review.dev.storj.io/c/storj/gateway/+/5066. We also extended our test suite to make sure this regression never happens again.

If you use the self-hosted gateway, you can build the gateway with these changes and use it right away. If you use the Storj-hosted gateway, you will need to wait for our next release, which we are targeting for Monday.

Again, thank you so much for the report, and I look forward to seeing you using Storj DCS through our S3-compatible gateway. Your use case sounds very promising!

Artur

daveM · June 29, 2021, 3:55pm

Hi Artur!

I have verified the fix (using the storj-hosted gateway) and have been able to resume the synchronization/upload process! Thanks for your help and everyone’s hard work to get a quick resolution!

daveM