Current situation with garbage collection

You’ve nerdsniped me. Please admit that this was your goal.

Some sample code for computing bloom filters with negligible amount of RAM and at a fraction of time: GitHub - liori/bloom_filters_for_nodes. generator is a sample dataset generator, and bloomfilter computers bloom filters. For the dataset of us1.storj.io size you need ~16 TB of storage, but you probably know that.

Can’t guarantee it’s correct because I’m drunk. But I’ve run it on my i3-9100f and I estimate that I could generate bloom filters for us1.storj.io in 20 hours with a single CPU thread.

The lousy attempts at concurrency didn’t pay off for me—just set thread_count to 1. i3.16xlarge is not much faster :person_shrugging: I/O is the bottleneck, but you probably know that. I guess I would have to get a machine like this one to actually take advantage of concurrency.

Algorithm details: basic radix sort.

3 Likes