Is there any form of lifecycle management of files stored in a bucket?
Backblaze has it - as does Amazon
Is there any form of lifecycle management of files stored in a bucket?
Backblaze has it - as does Amazon
You can set an expiration date for objects. If that is what you mean. Developers can then build such a service around Storj DCS if their requirements call for it.
Through the GUI?
I don’t see a way of doing this.
I guess there might be in the CLI - but I am not using that
Storj GUI is not meant to be full featured. The interface provided is mainly to allow developers and such to get an understanding of the tools at a high level. Most of the functionality is available at the Command Line level or via S3 connectivity to existing (and new) applications that leverage Storj, and which features they utilize is determined by the developers and what they need.
No, the GUI doesn’t have this feature, you need to use uplink CLI for example:
The other way is to generate an access grant/S3 credentials with provided TTL, i.e.
uplink share --max-object-ttl 24h --readonly=false --not-after=none sj://my-bucket
then use this access grant in any of your tools. All objects uploaded with this access grant will expire after 24h.
You may also generate an S3 credentials with the same behavior:
uplink share --max-object-ttl 24h --readonly=false --not-after=none --register sj://my-bucket
We’ve just hit this problem when trying to migrate customers from both S3 and Backblaze. Both have lifecycle policy’s on the buckets that delete the files automatically after 30 days.
We are a big user of rclone and this doesn’t seem to support the expiry feature described above.
Hello @dogsbody,
Welcome to the forum!
You have two options:
See
I would like to awaken this request. I have objects that need to expire after a certain amount of time has passed after the object is marked as deleted in a versioned bucket. Currently I have to manually clear out old/deleted objects using rclone. I don’t have a way to predict when the objects should expire at upload time as they may be arbitrarily deleted. All objects have a 30d compliance lock applied at upload time and extended as needed.
With Backblaze this deletion happened automatically based on the defined object lifecycle rule such as “delete old versions after x days”. This is the feature that I am missing.
We have a feature to upload objects with TTL, you can also create an access grant/s3 credentials which would have this TTL, docs are linked above.
However, Object Lock and TTL are Mutually Exclusive.
So, you should either use an Object lock or TTL.
Please also note, that rclone didn’t delete versions, it puts a deletion marker as described in the article Working with delete markers - Amazon Simple Storage Service.
To delete versions you need to specify this option, e.g.
rclone delete --s3-versions --s3-version-deleted --min-age 30d --rmdirs -P storj-s3:my-bucket/my-prefix/
You can still delete the object and all its versions even if they were uploaded with TTL. But if you use an Object lock feature, you will not be able to upload with a TTL. And depending on a Object lock mode (Compliance, Governance or Legal Hold) you might be not able to delete the not expired version at all.
Thank you for the rclone command, that is actually very helpful. I was using
rclone backend cleanup-hidden storj-s3:backup
before and it errors on every object still locked because it doesn’t check if it is locked before trying to delete it. The software I am using is kopia backup. It automatically extends the object locks to prevent backups being deleted, and puts delete markers on objects that are no longer needed. It does not remove hidden objects after their lock expires, so I have to use rclone for that step. This is where the lifecycle management feature would normally remove the deleted (hidden) objects after the lock expires.
I’m aware of the TTL feature, but it doesn’t work in this case as the object lock and TTL are mutually exclusive as you said. It’s also impossible to tell how long a certain object will need to exist at upload time, so no TTL.
I’m putting this out here for anyone else who decides to try these commands.
I used the rclone command rclone delete --s3-versions --s3-version-deleted --min-age 30d --rmdirs -P storj-s3:my-bucket/my-prefix/ you gave me and it deleted every object that was 30 days old (by adding a delete marker to them). This broke my backup. I was able to recover by removing all the delete markers to restore the versioned objects. Just putting this out here in case anyone else has the same problem.
I believe the issue here is that the rclone delete command automatically puts a delete marker on versioned objects when using the delete command. Then it chose objects that were 30 days old and deleted them. It was not removing already deleted (hidden) objects.
rclone backend cleanup-hidden does sort of work, but rclone doesn’t support object locks so it fails in a safer way. It will delete the unlocked objects with delete markers only, but then it tries to delete objects still locked, starting with the latest version which is always the delete marker, then it gets a 403 when it tries to remove the underlying object. So it actually ends up undeleting all the objects that are still locked. I’m looking into scripting this myself as rclone can’t handle it. rclone hasn’t implemented this because every other S3 provider has basic lifecycle management features that take care of this edge case automatically.
You are correct, rclone puts a deletion marker with the delete command, unless all versions would be requested to be deleted (which is not the case, when you use a filter like older than 30 days). However, I did expect, that it will delete all previous versions as well. Seems I was wrong.
I do not have any suggestions how to overcome this at this time and will share your feedback with the team.
Solid request. In M&A work, especially when we’re evaluating digital infrastructure during a business sale, lack of lifecycle visibility can become a blocker. Features that help define and control data/service lifecycles would be a win from both operational and compliance angles.
(Posted from the perspective of someone active in business brokerage — Phoenix, Peterson Acquisitions)
Hello @katehiggins,
Welcome to the forum!
The Lifecycle is exist: Setting Object Lifecycles - Storj Docs
Please clarify the use case in M&M to have it per-bucket, where it’s not possible to do the same with per-object?
Hi everyone,
TLDR; There is currently no way to automatically remove objects from a versioned Bucket, that are no longer locked, and marked as deleted. Right?
I also stumbled upon this problem while trying to realize Immutable S3 Backups with Storj and TrueNAS.
TrueNAS has an Integrated Job that allows Sync. Meaning, it will delete objects in the cloud, that are no longer existent at the Source.
If you consider Backups managed by a Software like Duplicati or Veeam this is really great as they will handle the deletion of Old Backups in the chain and TrueNAS will handle the the Deletion Marker in the cloud during a Sync Operation.
My Problem however is that versions marked as “deleted” will linger, and be billed, for as long as manual deletion takes as there is no possibility of Lifecycle management in Versioned buckets or am I missing something?
This resulted in a 300GB Backup chain wich should only be 30% bigger in the cloud due to versioning to be 2TB in the cloud as every version ever created was still existing in the Bucket.
Hi @Kazumsan,
Welcome to the Forum !
I’ve forwarded your post internally and someone will return with a reply.
Hello @Kazumsan,
Welcome to the forum!
There are many: Deleting Buckets Using Different Tools - Storj Docs
However, I believe you mean without running any additional command? If so, then you can either upload with a TTL, or integrate TTL to the access grant/S3 credentials, and expiring versions will be automatically deleted by the system.
But, if you would enable also the Object Lock, then you will be unable to use this TTL feature, they are mutually exclusive.
If you wanted to use the Object Lock feature with housekeeping, then you should use tools which are supports this correctly, like Veeam.
I think many backup tools may have this feature if they are able to recognize and use the Object Lock feature properly, otherwise it will be up on you to delete expired versions after them.
However, if you would use the backup tool which is not aware about neither versioning nor object lock features, and they are not a simple sync (which is actually not a backup solution, it’s a replication solution), then you must not use neither the object versioning nor the object lock even if TTL would be possible - you may corrupt their hash structure and it would be impossible to restore from a such backup copy. This is because you cannot be sure which index file or pack can be safely deleted, so auto TTL may destroy it.
For that case you may enable the object versioning and use the integrated TTL to S3 credentials. As a result it could be called a backup solution, because you will be able to restore the older version of the single file for example. However, it would likely consume more storage and more segments, driving your bills only up (you would have several copies of the same object as its versions).
So, I would recommend to configure the TrueCloud Task instead, it’s restic with UI, so it will have snapshots and will pack smaller files to a bigger chunks reducing your storage and segments costs. But in that case you must not use versioning, because versions are already integrated to that backup solutions.
The object lock could be enabled though, but you will have the same problem - you will not know, which expired lock versions are safe to delete. You may assume that the versions are safe to delete, if there is a deletion marker, but it’s difficult to automate - you need to select only objects with the deletion marker and delete only versions of these objects. So, it’s better to do not use this feature too in that case.
I found a workaround to delete only expired versions:
uplink ls --encrypted --recursive --all-versions sj://locked-bucket/ -o json | jq '"uplink rm --encrypted --version-id " + .versionId + " sj://locked-bucket/" + .key' -r | bash
It will throw an error when it will be unable to delete the locked version, but will continue and remove all versions which it could.
However, it cannot be used with a backup repository, because if some files of the backup repository are already expired but not updated by the backup solution, it will destroy/corrupt the backup.
If you want to use a backup solution and Object Lock - you must use the backup solution compatible with Object Lock like Veeam. You must not use any lifecycle management outside of the backup solution.
It’s independent of existence of the lifecycle management for Object Lock, it will destroy/corrupt the backup on any S3-compatible provider if the backups solution is not Object Lock aware.
I use a backup tool called Kopia that is object lock aware. It actively extends the object locks on “active” pieces in the hash store. When a pack file is no longer needed, it puts a delete marker on the object, then the lock expires after some configured time and the underlying object storage provider is supposed to clean up the expired objects. S3, B2, Scaleway all implement this in some form. It’s part of the S3 standard and is necessary for true object lock/versioning compatibility. All 3 providers have a configurable “remove expired objects” meaning objects that have delete markers are removed when locks expire.
Right now this only half works because the objects that kopia “deleted” aren’t being removed and just taking up more space over time. Exact same situation as @Kazumsan is experiencing with TrueNAS and Veeam.
I appreciate the workaround @Alexey but this is not a long term solution. The S3 gateway doesn’t correctly return delete markers when queried so tools that have this feature like rclone are also broken, and it has to cycle through every object version, try to delete them and get an error returned when it’s locked. A solution like this will also remove delete markers as those don’t usually get object locks. So backup tools that work in this way don’t fully work as they start getting confused by “undeleted” objects showing back up when the delete markers are removed.
My solution is to move to an actual S3 compliant provider until this gets fixed properly.
Hello @AussieNick,
Welcome back!
It supports Object Lock fully, so it deletes expired versions itself and does not relay on a provider.
This is mean that Kopia actually is not object lock aware, since it cannot manage locked versions itself and forced to relay on a provider. In case of an autocleanup you may say that any backup tool is object lock aware, which will not be true.
They should not show up, unless they are still locked. For the case when the backup tool puts deletion markers on still locked objects you need to have a more smart script to filter only objects which are unlocked and under the deletion marker. Or make it more simple and delete only versions under the deletion marker and then delete the marker, if all previous versions were deleted.
It’s not part of an S3 standard, see
It’s an extension and provider-specific, otherwise providers wouldn’t do this:
We have an implementation of expiration too and it’s made as an extension, see Object-level TTL.