No bloom filters from US1

In the last 6 hours I have not received any audits from AP1 :thinking:

How much memory are in the production servers, responsible of producing the bloom filters?

Unless it’s a huge amount of memory needed, it seems like an easy problem to solve by throwing dimms at the it, until it goes away :slight_smile:

The only data point I could find is this post:

Pretty old though. Since that time Storj has started generating much larger bloom filters, and there’s more pieces as well, so I wouldn’t be surprised if Storj crossed 500 GB.

1 Like

500 GB is a lot for a single application. But it’s not a lot for a single server.

DDR4 is cheap. Almost any modern 2 socket system should take 24dimms, and with 64GB dimms being a really good value right now, it takes an afternoon to make a 1.5TB system.

I run 128GB dimms in all of my work servers and with dualCPUs with 8 channels of memory, that brings me at 4TB/server. We are contemplating moving to 256GB dimms for 8TB.

It’s expensive yes, but if it’s a single server that brings garbage collection, bloom filters and therefore data that StorJ is paying node operators for that’s not generating revenue, it should be a drop in the bucket

1 Like

Is it possible something else is being un-trashed now? In the last hour it seems like used-space has climbed more than any incoming traffic could account for.

1 Like

I am seeing the same here. My trash is dropping and my used space is increasing fast.

1 Like

I checked my logs and I see they have sent out another restore request a couple hours ago. That’s a total of 3 in the last couple days. I guess they really want their trash back.
My trash was already fully restored the first time(except for maybe on my hashstore node. That thing is crazy and I don’t know if it works correctly) but maybe for bigger nodes they have to keep sending restore commands just in case a slow bloom filter was still in the middle of processing or the node was offline earlier during the first request :thinking:?

Today my nodes had around 500Gb trash. Now it’s only like 50mb trash. But average disk usage is pretty low. This looks like some bigger issue :thinking:

I think that may be it. Whatever happened that they wanted all that trash back…with over 26k nodes they won’t all be online all the time. Maybe this is just a second pass to catch the stragglers.

This is necessary because Storj doesn’t know whether your nodes have already processed bloom filters or not. Imagine what would happen if a restore request happens before filter finishes processing on a node with slow I/O.

So you say a Trash restore wouldn’t cancel pending bloom filters? I don’t know the code but wouldn’t it be smart to implement a restore trash as an emergency stop for current running bloom filter (and still pending ones)

1 Like

It could, but it’s still better not to put assumptions like that in an operation that is supposed to be a fault recovery procedure.

1 Like

I have a few nodes and they usually sit on 10-15TB trash at any given time.

Currently at 3.65TB total trash and decreasing fast, at this rate we will hit 0 within 1-2 hours.

8 Likes

So… they had some server problems generating the bloom filter; than they restored all the trash… Can we assume the bloom filter was bad/wrong generated/incomplete and the pieces were wrongly deleted by our nodes?
I hope we didn’t loose pieces. :cold_face:

1 Like

I would fire off the decision to restore the trash if there was even a so small chance that the filters were generated incorrectly like you say.

With that said, I haven’t really seen any abnormal trashing on my nodes and the trash was restored successfully. I think we’re fine. The storj team could monitor for slightly higher failed audit rates which would have started to show up already if things were not fine. Increasing the repair threshold should help to eventually fix things in that case i guess?

1 Like

It would be time for Storj to release a statement on this

9 Likes

In a normal situation, so many failed restored from trash pieces can DQ a node. I found on all my nodes that error.
https://forum.storj.io/t/nice-restore-everything-from-trash-again/29264?u=snorkel

Looks like node operators are storing data for free again? At the moment, the actual usage of the space exceeds the paid values ​​by approximately 15-20%

2 Likes

If trash gets restored you’ll get paid for it. But satellites aren’t sending out usage stats (again).

1 Like