Calculation the size of the deleted files is unnecessary (and painfull)

jammerdan · November 5, 2024, 11:15am

I don’t know much about Badger cache, for example how long entries will be cached or what process will make use of it either by filling the cache or reading from it.
I mean yes of course if the data is in the Badger cache, why read it again from the file system? And I make the suggestion again, why read it for every single file in the trash instead of using the folder sizes where the files just have been moved to?

For the trash I don’t think you would need a huge database as it is very structured. You have a maximum of 1024 subfolder per date folder. A simple text file would do. It would contain up to 1024 lines (1 per subfolder) and a number representing the size in bytes per line. I don’t know how to calculate it but it does not sound like a very huge text file.

I have found 2 interesting links/quotes that I want to add here:

We seem to do something similar for the save-state-resume-filewalker:
lazyfilewalker optimization for storage node used spaced calculation · Issue #6900 · storj/storj · GitHub

Discussion here: https://review.dev.storj.io/c/storj/storj/+/12806?tab=comments

TLDR:

… storing the last_checked date + size for each prefix directory.

So why shouldn’t it be possible to store the size of a prefix directory after all trash files have been moved into it or while moving.

So with some cache for the trash, we would not have to stat every single file that we have just moved with a retain process into a trash folder.