Script using aws s3 to delete data older than x days very slow

Hello community,
i try using a script to delete all files that are older than 7 days in my media cache directory in a s3 bucket on storj. However the process is very slow, about one file processed per second (there are currently 80.000 cache files on the bucket).
Is there a faster way to achive the same goal (delete all files older than x days in folder y of bucket z)?

See my script below:

#!/bin/bash

# Usage: ./s3d "bucketname" "7 days"

aws s3 --endpoint-url=https://gateway.storjshare.io ls s3://$1/cache/ --recursive | grep " DIR " -v | while read -r line;
  do
    createDate=`echo $line|awk {'print $1" "$2'}`
    createDate=$(date -d "$createDate" "+%s")
    olderThan=$(date -d "$2 ago" "+%s")
    if [ $createDate -le $olderThan ];
      then
        fileName=`echo $line|awk {'print $4'}`
        if [ $fileName != "" ]
          then
                aws s3 --endpoint-url=https://gateway.storjshare.io rm s3://test1/$fileName
        fi
    fi
  done;

I will be happy for any hints…
Christian

You may actually only improve a deletion, but not search, see example amazon s3 - Most efficient way to batch delete S3 Files - Server Fault

The other way is to do not use S3 interface when you upload your files, uplink CLI allows you to specify an expiration date for each object, and they will be removed automatically.

uplink cp --expires +7d --recursive ./my-files sj://my-bucket
1 Like

Thanks for the reply Alexey, i will have a look at the link to speed up the deletion at least a bit. Currently new objects come in faster than i can delete the expired ones :wink:
The software currently only has one upload script for all files, but some folders need to be persistant and others can flush every x days, so changing to uplink cli will require some more change in software.

The scripts from your link are fine, but now i noticed my issue.
There was a webinar on your site which enabled storj for pixelfed and there it was recommended to protect the bucket against deletion

Download : Allowed
Upload : Disallowed
Lists : Disallowed
Deletes : Disallowed

So the script runs but gets

"Errors": [
    {
        "Key": "cache/accounts/headers/109/455/731/601/681/966/original/40da79101b2b9e18.jpg",
        "VersionId": "",
        "Code": "AccessDenied",
        "Message": "Access Denied."
    }

Is there any chance to edit the rights later on or do i need to start from scratch?

Christian

You need to generate a new access grant and/or S3 credentials with allowed deletions (perhaps you also would need a list permission to search objects), and use this access grant/s3 credentials for deletions.
You should use the same encryption phrase, used during upload (as in your first access grant/s3 credentials).
Both credentials will work fine in parallel, you do not need to start from scratch, you only need to use the same encryption phrase, but different access permissions.

2 Likes

neat. i’ll give it a try

1 Like

It works like a charm! Thanks for the hint

2 Likes