Atomic snapshots for buckets

Toyoo · November 29, 2022, 11:19pm

I’ve talked to a heavy AWS S3 user today. He complained to me that they’d love to see a form of atomic snapshots for whole buckets, and AWS S3 does not offer anything like that. The apparently keep tens of millions of objects in a single bucket, which keep changing. Any iteration over all items in a bucket is bound to take hours. And they’d love to be able to make an atomic snapshot of a bucket’s state for backup and investigation purposes.

Is this possible with server-side copies on Storj?

Alexey · November 30, 2022, 8:33am

Snapshot yes, but it would be not instant, thus not atomic. You may increase --transfers though

./uplink mb sj://test2
./uplink cp --recursive --transfers 20 sj://test sj://test2

Toyoo · November 30, 2022, 10:09am

Yeah, for the use case he was describing, he really insisted on atomicity. He is working around his problem now by setting up an EBS volume with LVM and does LVM snapshots instead, but he was clearly annoyed he has to do so.

Vadim · November 30, 2022, 10:22am

I thin the problems can be solved easily, as any change in file for storj is a new file, then old files just need to keep every instance, it takes space(lot of space) but you can recover every change.

Toyoo · November 30, 2022, 10:23am

He stores hundreds of terabytes this way. Storing every past instance would be quite expensive.

Vadim · November 30, 2022, 10:25am

you dont need to hold it forever, also usually people change something like 10% of files or even less.

Toyoo · November 30, 2022, 10:26am

As I understand it, the oldest files in his dataset are ~3 months old. As in, the past storage would grow by hundreds of terabytes every 3 months.

Vadim · November 30, 2022, 10:26am

Also backups are making copy of all files, but this way it will make files that chenges.
Every backup will twice this amount.

Toyoo · November 30, 2022, 10:30am

The idea was to make a snapshot to be able to debug the state of application and to make a consistent offline backup. So these snapshots would be short-lived.

Vadim · November 30, 2022, 10:31am

Client can always make some time thing, like after 3 months it delete all old version and stay only last and 1 previous version. More over if they are short time, then client just delete them. When you change file in storj you always need to upload new file, then you deside delete or not delete old, it not happen itself.

Toyoo · November 30, 2022, 10:39am

The difficulty is not in storing files, the difficulty is in knowing what was exactly the state of the bucket at a snapshot time. That is, this file existed and had this content, that file didn’t exist yet.

Manually querying all files would be too slow.

Pentium100 · November 30, 2022, 11:28am

Snapshots take up space as well, since they work the same way (some way of keeping what has changed).

But yeah, since in Storj, every change is a new file, this means that Storj is basically COW, adding snapshots to that should not be too difficult - just keep the old data until the snapshot is deleted.

Jacob · December 1, 2022, 4:31am

@Toyoo Would this user be willing to send us more details or get on a call to talk to a product manager at Storj?

We’d love to learn more details and exactly the problem they are trying to solve. Send me a PM if they are willing to chat.

Toyoo · December 1, 2022, 8:14am

Hmmm… I’ll send him a message, we’ll see.

Alexey · December 2, 2022, 3:36am

Maybe they can use Using Restic for Backups - Storj Docs
It has snapshots feature and uses not 1:1 space.

By the way, server-side copy on Storj DCS doesn’t uses 100% for the copy, until you change the original, then it will start to consume space.