Bloom Filter Processing Order Issue: Held Back Filters Not Processed

Here is something that I just noted:

Due to the known issue

, the retain concurrency is currently set to 1, which means that only one Bloom filter is processed at a time. This has resulted in a backlog of Bloom filters waiting to be processed. It looks like this:

ls config/retain
pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa-1722448799995125000.pb  ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa-1722880799999530000.pb
pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa-1722802182043487000.pb  v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa-1722967199982862000.pb
qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa-1723053598309773000.pb

The current processing order appears to be alphabetical, with the system processing Bloom filters from the same satellite in sequence. For example, after processing the first Bloom filter for the Saltlake satellite for a week, the system immediately started processing the second Bloom filter for the same satellite, leaving out Bloom filters from other satellites.
I don’t think this is how it should be. I believe that the system should process Bloom filters in a First-In-First-Out (FIFO) order, based on their arrival dates. According to the last modified = arrival dates the correct processing order should be:

1. pmw (03.08. 11.41)
2. v4w (08.08 04.33)
3. pmw (08.08. 09.46)
4. qst (08.08. 22.33)
5. ukf (10.08. 01.46)

Otherwise the system would always keep processing only Saltlake filters as long as new Saltlake filters are present. This means retaining and trash deletion for the other satellites are effectively blocked.

Furthermore, I think that with concurrency > 1, the system should process Bloom filters from different satellites concurrently, rather than processing multiple filters from the same satellite in parallel. This would also help to prevent the system from getting stuck processing filters from a single satellite, and ensure that filters from other satellites are processed as well in a timely manner.

3 Likes

I do not think that it’s matter, because all BFs should be processed anyway. If the prepared BF takes too long time, that’s also mean that your node likely has here most of the garbage.

I believe it does matter. Satellites are independent and share the same available free space. SNO does not have any influence at what time and how many Bloom filters a satellite operator is going to send.
Also a SNO does not have influence how the satellite is named. I don’t think a satellites BF should be always processed first just because its name starts with a.
If it really does not matter in your opinion then there should be no reason not to switch it to a sort order by date first. You could also make this configurable.

1 Like

I do not think that we sort it at all. It’s a default list, returned by the OS.

I can imagine this is the case. As originally the Bloom filters didn’t even get stored I am sure nobody thought of a case that they are stacking up and multiple of them waiting to be processed.

The OS has no concept of concurrency, satellites or independency. So it needs to be told which is the correct order to return that list.
This should be fixed together with the concurrency issue to make Bloom filters processed by date and to implement concurrency per satellite rather than processing multiple Bloom filters from a single satellite at the same time.

I still not sure that we need to spend time on this. All BF should be processed and order doesn’t matter with concurrency 1.
Or do you mean, that processing of two BF for the same satellite with concurrency 2 or higher may be a root cause of hanging the retain process, because they blocks each other?

No it’s 2 separate issues. But:
The reason we cannot use concurrency > 1 at the moment is that the node will happen to process 2 or more Bloom filters of the same satellite. The issue describes problems when processing Bloom filter from the same satellite not from different satellites. One reason this happens is also the order of processing when there are more than one Bloom filter for a satellite. That’s why my suggestion could also help a little with this issue. (Not in all cases, but in some)

I don’t know if the issue will be solved in a way that makes it possible to process 2 Bloom filters at the same time. But even if, I would still prefer that with concurrency > 1 different satellites get processed rather than multiple Bloom filters from the same satellite.
But also in that case ordering by modified date has better outcome than alphabetical ordering.

Seems like this approach would be helpful for you, so bumping it into this thread. Just wondering how many other nodes are affected—if it’s only yours, then there’s probably not much of value.

Storj does seem to have access to metrics from nodes, but I’m not sure there’s a metric “how many bloom filters are waiting to be processed”. If there was one, they would be able to learn about other nodes.

Here is something that I have just found:

https://review.dev.storj.io/c/storj/storj/+/13081

With concurrency flag set to 1 retain queue will process entries one by one ordering filters with CreatedBefore fields (from oldest to newest value).

Is this relevant? I dont’t know anything about a CreatedBefore field. Maybe I missed that.

There are also some interesting comments. It seems there is some thinking about filtering. But I don’t understand what the result is they came up with.

It’s ordered by the CreatedBefore, not by a filename.

CreatedBefore it’s a date range (shifted to the past) where BF should operate, without that date the filter may delete newly uploaded pieces (because on the moment of the BF generation these pieces didn’t exist yet).

You may see it in the logs filtered by retain and Prepar.

grep retain /mnt/x/storagenode2/storagenode.log | grep -E "Prepar|Move"

If that’s the case then it is even better.
I need to check if I can observe that on my nodes. Until now it appeared to be in alphabetical order.

Obviously the CreatedBefore information is not clearly visible. Where is it written and this would mean the node scans all Bloom filters for this information and based on this decides which one to retain?

In the BF file, however, you may use logs to see it in the decoded form.

But something does not add up. These are the Bloom filters with the modified dates on one of my nodes:

pmw 12.08. 01.24
qst 09.08. 03.53
ukf 11.08. 23.48
v4w 07.08. 23.06

On the file system I can see from the timestamps of the subdirectories that it is currently retaining the v4w BF. It started yesterday at around 23.57. with subdir letter aa as expected.

But I can also see that before that it was processing the ukf BF. From the logs it processed it until yesterday at about 23.55.
But it seems that process got interrupted as it stopped with letters ul while it would be expected that it should be at zz when it has finished. Also the BF for US1 is still there.
Before ul it was processing subdir uk which has a timestamp 11.08 23.33. Then it proceeded to ul which has a timestamp 11.08. 23.55.
The BF for the US satellite has timestamp 11.08.23.48.

And at the moment it is processing EU1 BF with timestamp 07.08. 23.06.

I don’t understand that pattern.

it’s a newest one for US1 as far as I can see.

it’s a US1 BF. Do you know why it was interrupted? Did you have an error or the node is restarted?

It doesn’t matter. The only matter is a creation date of the piece and what pieces the BF is processing (see the “Created Before” date in the “Prepared” for retain). The date of the folder doesn’t matter.
And since you seems referring to the access date, I would assume that you mounted this drive to use atime, which slow down the filewalkers a little bit on every file. When you have millions of files it become noticeable.

And it’s logical, because this BF has an oldest creation date (and likely the oldest “Created Before” inside).
So the only question, why US1 BF was interrupted. Did you check, maybe it is finished, you need to check for “retain” and “Moved”. And this one for ukf (US1) - is a fresh one, because the oldest one has been processed.

By the way, it’s better to track folders differently:

I had to edit my previous post as I had the subdirs in the wrong order.

To clarify:
I see timestamp processing for uk subdir: 11.08. 23.33 and ul subdir: 11.08. 23.55

Yes, the US1 bloom filter is the newest.
The timestamp I am referring to is the modified time. The partition is mounted with noatime.
So when I look at the timestamps of the subdirectories it seems to me that subdir uk must have been processed by a different Bloom filter than the current one with timestamp 11.08. 23.48.
So I am wondering where did it go as it seems it did not finish regularly. Maybe I am wrong but I would expect it to finish with subfolder zz and not stop with uk.

Probably due to some timeout.

You may be sure that it’s not finished, if you have Prepared, but not Moved for the retain process and the particular satellite.
It’s possible that there simply weren’t any pieces found to move from those folders, that’s all.

then you would have this error, it will be retain and context canceled or any other error during the way.

Logs got wiped due to restart so I cannot verify the exact reason unfortunately.

Then it’s likely the reason of the interruption, I can guess.

In the meantime 2 BF have finished processing.

I am seeing now following BFs left:

pmw 12.08. 01.24
ukf 11.08. 23.48

Dates are modified time.
It is currently processing the pmw Bloom filter:

2024-08-12T05:48:59Z    INFO    retain  Prepared to run a Retain request.     {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-08-06T17:59:59Z", "Filter Size": 765186, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}

So is "Created Before": "2024-08-06T17:59:59Z" the date the Bloom filter has been created? If so, why does it have a modified date 12.08. assuming that this is the date it arrived. That would mean it took almost a week after creation until it finally was received by the node. This is long.
Also this would mean, that the ukf Bloom filter must have a later creation date, correct? Because the pmw Bloom filter is getting processed first. So the ukf filter was created later and arrived earlier on the node?

Only this date is matter:

You may see when the BF has been generated by the Linux epoch in the filename, but remember, you need to use only first 10 numbers.