How does overwriting work?

jammerdan · December 8, 2020, 8:04am

When uploading with Filezilla and the file does already exist, I have the option to overwrite.
Does it literally overwrite, meaning the distribution of shards remain the same, what I doubt.
Or does it mean it uploads to a completely new set of nodes and deletes the old copy afterwards, resulting in a totally new distribution of shards?

Alexey · December 8, 2020, 8:38am

It’s a completely new upload. Pieces should not lend to the same nodes. But I think they could due to cache, if the interval between uploads too short (few minutes).

jammerdan · December 8, 2020, 8:50am

Thanks and wouldn’t that make a nice undo feature if a user accidentally selects to overwrite a file on upload?
Imagine in the middle of an upload you notice that you are overwriting the wrong file. That would be easy to abort and revert to the existing file.

Alexey · December 8, 2020, 8:54am

Unfortunately it’s not so easy. You should keep a versioning in the metadata too (on the satellites) and actually keep copies of your files in the network and pay 2x, 3x and so on, depending on number of versions.
I think this feature could be implemented sometime in the future though, but now your application should do so, the current implementation has all needed stuff there - the object storage, you can attach a metadata to each object and read that metadata to interpret it as versions for example and real name of the file.

BrightSilence · December 8, 2020, 12:16pm

Allowing versioning for the purpose of backups is definitely an interesting feature. Though it would probably come with an expansion of possible operations. Like a normal delete should only remove the current file, but keep history. So you would also need a way to delete all historic versions. Then there are the considerations for version rotation. And the best implementation of such functionality would be to use incremental changes to prevent a lot of data duplication. However, that requires you to actually be able to read the data, so generating incremental data can only be done client side (since the data is encrypted anywhere else), which would require downloading existing data every time there is an upload.

So yeah, pretty massive extension of functionality, but still very interesting. Probably not the firs thing on the backlog though.

jammerdan · December 8, 2020, 12:30pm

The background why I asked was much simpler:
You know as uploader you have to upload 2.7x times the size than your actual file. Now when you mess up and select to overwrite the wrong file, you have to do that again. Totally acceptable if the file is in fact already overwritten, but with current implementation the old file may be recoverable from such a failure. And if you handle large files where upload times really could matter, it might be a great thing, if you could recover from such failures.
So the idea would be that while it is still uploading, that interruption and recovery could be made possible and maybe for a minute or so after the upload has finished before final deletion of the old file.
It’s just some idea.

BrightSilence · December 8, 2020, 12:38pm

That would require the delete to be delayed. I’m kind of assuming it is implemented as a delete and upload, with the delete going first. Since otherwise the upload would fail because there is already an object with that name.

So yeah, your feature may be much simpler, but I think it would require already implementing a good part of the logic that could be used for versioning. So at that point you might as well make a great additional feature out of it.