Not what I meant. Process any bloom filter immediately, but if another comes, it should first join the efforts of previous one and operate on the two-letter directories that were not scanned yet by the first one, and only then do its job on the two-letter directories that were already finished by the first one. Same with more bloom filters.
For example, in pseudo-Python code:
to_scan = collections.defaultdict(list)
# each time a bloom filter comes:
bloom_filter = BloomFilterObject()
for dir in '22', …, 'zz':
to_scan[dir].append(bloom_filter)
# a separare goroutine running the actual bloom filter scanning:
while to_scan:
dir, bloom_filters = to_scan.popitem()
for file in dir:
for filter in bloom_filters:
if not filter.contains(file) and file older than filter:
os.delete(file)
break
This seems like an interesting problem with which the community might be able to help. Last time this topic was discussed, it was mentioned by Storjlings that it’s a matter of memory usage. I’ve proposed an approach to solve this before, maybe it could be of help to you?
You probably haven’t seen this post, but this statement has been challenged before.