Atomic snapshots for buckets

I’ve talked to a heavy AWS S3 user today. He complained to me that they’d love to see a form of atomic snapshots for whole buckets, and AWS S3 does not offer anything like that. The apparently keep tens of millions of objects in a single bucket, which keep changing. Any iteration over all items in a bucket is bound to take hours. And they’d love to be able to make an atomic snapshot of a bucket’s state for backup and investigation purposes.

Is this possible with server-side copies on Storj?

1 Like

Snapshot yes, but it would be not instant, thus not atomic. You may increase --transfers though

./uplink mb sj://test2
./uplink cp --recursive --transfers 20 sj://test sj://test2

Yeah, for the use case he was describing, he really insisted on atomicity. He is working around his problem now by setting up an EBS volume with LVM and does LVM snapshots instead, but he was clearly annoyed he has to do so.

I thin the problems can be solved easily, as any change in file for storj is a new file, then old files just need to keep every instance, it takes space(lot of space) but you can recover every change.

He stores hundreds of terabytes this way. Storing every past instance would be quite expensive.

you dont need to hold it forever, also usually people change something like 10% of files or even less.

As I understand it, the oldest files in his dataset are ~3 months old. As in, the past storage would grow by hundreds of terabytes every 3 months.

Also backups are making copy of all files, but this way it will make files that chenges.
Every backup will twice this amount.

The idea was to make a snapshot to be able to debug the state of application and to make a consistent offline backup. So these snapshots would be short-lived.

Client can always make some time thing, like after 3 months it delete all old version and stay only last and 1 previous version. More over if they are short time, then client just delete them. When you change file in storj you always need to upload new file, then you deside delete or not delete old, it not happen itself.

The difficulty is not in storing files, the difficulty is in knowing what was exactly the state of the bucket at a snapshot time. That is, this file existed and had this content, that file didn’t exist yet.

Manually querying all files would be too slow.

Snapshots take up space as well, since they work the same way (some way of keeping what has changed).

But yeah, since in Storj, every change is a new file, this means that Storj is basically COW, adding snapshots to that should not be too difficult - just keep the old data until the snapshot is deleted.

1 Like

@Toyoo Would this user be willing to send us more details or get on a call to talk to a product manager at Storj?

We’d love to learn more details and exactly the problem they are trying to solve. Send me a PM if they are willing to chat.

Hmmm… I’ll send him a message, we’ll see.

Maybe they can use Using Restic for Backups - Storj Docs
It has snapshots feature and uses not 1:1 space.

By the way, server-side copy on Storj DCS doesn’t uses 100% for the copy, until you change the original, then it will start to consume space.