How bloom filters work

I see, I guess I don’t get how bloom filters work ^^’
Alright, then I guess it’s not such of a problem :slight_smile:

The bloom filter is created to always match all pieces that should be on the node. The node then checks which pieces don’t match the bloom filter and removes them. (though for reasons of efficiency, the size of the bloom filter is reduced by making it match about 10% of what’s not supposed to be on the node as well. So with every run about 10% of garbage stays behind to be cleaned up the next time)

Does this mean it takes 10 bloom filter “passes” for removing all orphan pieces?

:thinking: Hmm, that doesn’t add up… I guess it’s more like “the size of the bloom filter is reduced by making it match about 90% of what’s not supposed to be on the node”


As a ide note: Does this rely on databases? If you lost all databases (that happened to me once, it doesn’t kill the node but stats are lost for current month): Would bloom filters still work?

It doesn’t rely on databases. The bloom filter tells the node what not to delete. So if 10% of garbage matches the bloom filter, the remaining 90% gets deleted.

It is likely that false positives (ie. garbage that is not removed) will be the same, or very similar, across runs, so no, unfortunately not.

I’m having a hard time to understand all that. Besides, if it doesn’t rely on databases, it would mean the node would have to browse all pieces to find those that should be deleted, which would make crazy amounts of IO. But that’s not what’s happening…

I’m confused :confused:

But let’s get back on the topic: it does answer my initial question about what happens to “foreign” pieces that could have been added intentionally among STORJ pieces: they get would get eventually deleted by the garbage collector! That’s good enough for me :slight_smile:
Cheers.

Some simple explanation of Bloom filters, maybe it will help. Note that the following is not an exact description of Bloom filters as implemented in Storj, rather a generic description of Bloom filters with some guesses as to how it applies to the garbage collection process.

Imagine you have data about English words. Each file is an entry about one word, its definition, example usages, etymology, etc. My local dictionary lists 102401 words like yodeling, lofts, particularly, fatigue, standing, contour etc., and a single node may keep data about tens of words at a time, which is already a long list. Imagine calling a SNO over a phone and telling them 30 words.

Instead, you call your SNO and tell that they are supposed to have data only about words whose names use letters “abcdeghiklmnoprstuvy”. So now you need to tell them up to 26 letters, regardless of the number of words the SNO is expected to store! This is your Bloom filter. yodeling is fine, standing is also fine—these are the words you expect them to have. But if, let say, the SNO had a word fatigue, as it has the letter f, the SNO can remove it, because you no longer expect that SNO to have any words with f. On the other side, your SNO will think they need to store words particularly and contour. Last Friday you tried to call them and tell you want them removed, but their phone was turned off.

Now, after a week, SNO is requested to add three new words and remove two old ones. The list of letters is now “abcdeghiklmopqrstuvy”. contour no longer qualifies (the letter n disappeared), but particularly still fits. As the list of letters didn’t change much, the false positives will likely be mostly the same.

Bloom filters are like that, except instead of letters in a word they use bins formed from cryptographic hashes, so you can make more bins (not just 26 letters; more like thousands or even millions of different letters—imagine Chinese characters ;-)). The more bins, the harder to make mistakes, but also more data to send every time, so it’s a trade-off. Storj chose the trade-off to be so that the rate of mistakes is 10%. Besides, your node is expected to stay online most of the time, so if there’s garbage, it’s only because the node was offline at the time when the satellite tried to tell the node to remove them.

What is specifically relevant to this thread is that if SNO missed the request for deletion of contour, this word will only get garbage-collected when the filter loses at least one of its letters. So if SNO were to be expected to keep tour and constraint for the next ten years, they would never learn that contour was to be removed.

There are some tricks to quickly compute which bins are required, and the node can store some information on which bins correspond to which files, so that removal is fast. There’s also the trick of using multiple hashes instead of a single one. But I don’t know enough details of Storj’s implementation of the Bloom filters, so can’t comment on that.

11 Likes

This is a great explanation of bloom filters; if we ever get a StorJ community wiki, this would be a good addition. I appreciate how you broke out the whole example as a demonstration.

1 Like

Not true. The process uses a different seed each time specifically to prevent this.

That’s exactly what happens. But it only needs to check against filenames. It depends on the file system and hardware, but on my slowest node hardware this process can take hours. On the fastest which is ssd accelerated, it takes a few minutes at most. If you turn on debug logging you can see the retain process running and going through files which are moved to trash.

That was great @Toyoo, thanks a bunch! :+1:

Are we 100% sure that’s how it works? I do have a lot of garbage (2 to 15 GB, kinda constantly) on my nodes although they are online as much as possible (99.85+%). Even the one on my 7200 RPM CMR drive has some data in the trash (~3GB currently).

How often do satellites send bloom filters? I mean one of my nodes takes ages (almost 30 hours) to browse all files whenever it updates. It surprises me I did not notice it IOing like crazy regularly because of the garbage collection. Besides, that sounds like a terrible thing to do to our disks! Updates are already quite heavy on them, but if the GC does this regularly as well, they are goners:sweat_smile:

In a perfect world, that’s how it works. In my experience slower nodes may accept pieces just a little too late, which can cause completely finished transfers to end up in garbage collection. It’s a fallback mechanism, which means it’s kind of hard to determine where data that was left behind came from.

I don’t know exactly. At least once a week, maybe more often. But since it only has to read file names, it can be a lot faster than the file walker when you start the node. I only notice it taking any significant time on my node which runs on a Drobo, connected over USB2 using NTFS on a linux host. That’s kind of a worst case scenario. But even though all of that is a bit of a mess, I think it’s the use of NTFS that has the biggest impact. As NTFS doesn’t have all file metadata in one place, but rather stored along with the files. It requires a lot more effort to compare filenames than other file systems which store that stuff neatly together.

2 Likes

Ah right, it’s not every 12 hours or so… alright.

What else does a node do when starting up then? I thought it simply had to browse file names & folder structures. Does it actually read all files’ content?

Don’t know exactly, but it at least deals with file sizes as well. We’re getting a little (extremely) off topic though. And I think the main subject here is important. So better stop the distractions.

It’s sent every 5 days

1 Like

Oh, good to know! I thought this approach would be too expensive for nodes and satellites, as it essentially means you can’t update bloom filters computed last time, but if it is not, great!

1 Like

Thanks @Alexey for splitting the topic!

And thanks for that addition. :slight_smile: I wasn’t sure exactly and too lazy to look it up, but that sounds exactly like what I’m seeing.

2 Likes

So… where exactely these bloom filers are stored on nodes, when they are received, if not in a db? I imagine it’s a list with text? So it’s a text file?
Second: why do we need to move pieces to trash? Can’t they just be marked by GC for deletion after 7 days, in the matadata, in inodes, etc. somewhere in the filesystem, with an expiration date and time, without moving the files?

It is probably because Storj does not want to pay SNOs for the deleted data.

1 Like

It’s not the goal.
The pieces moved to the trash to have an ability to recover them if the bloom filter was wrong and deleted too many pieces, or other similar issues, which can potentially affect customers.

But it is free storage for Storj.