Current situation with garbage collection

snorkel · April 18, 2024, 8:40pm

So what happens when a node at ver. 101 is stopped, removed, restarted and goes to 99? Can it recognise the new dir structure? Will it revert to old trash directories?

BrightSilence · April 19, 2024, 7:05am

Are we expecting new bloom filters for US1 today? I’d like to keep an eye on that node again. I see that a few of my nodes have been updated to v1.101.3, but not the one I was monitoring yet. Another one with a big discrepancy has though. But I’m guessing this round of BF’s will still stick to the 4100003 limit?

Alexey · April 20, 2024, 4:34am

You got these results with

?
I updated the link to the more correct PowerShell script.

elek · April 20, 2024, 8:21am

Yes, It will. The current plan is running a separated bloom filter with 10Mb limit, but also keep the original BF generator for safety. (execution of 10Mb might be too long, and also it requires more memory.

When we have bigger bloom filters, we can send it to test users / volunteers, and double check the behavior. But generating the 10Mb filters is additional 1-2 weeks. (we need to setup a database restored from the backup + generation may take 5-? days…

BrightSilence · April 20, 2024, 9:47am

Sounds good. I have 2 similarly sized nodes with large uncollected garbage amounts. I’d love to test the 10mb bloom filter on one and compare the difference. Let me know how I can help with testing.

Alexey · April 20, 2024, 12:18pm

Your help will be priceless, because you know - the emulation/mock is a one thing, but the real node - is completely different.

kocoten1992 · April 20, 2024, 12:30pm

I don’t think @BrightSilence situation is alone, I’ve almost the same stat as him, indicate this is largely systemic problem, very possible solving his problem will solve for us all.

Roxor · April 20, 2024, 12:32pm

So the rest of us will be getting typical mass-produced consumer bloom filters… but you’ll be getting the good stuff? Like artisanal hand-crafted gluten-free free-range pesticide-free organic bloom filters… made in small batches… infused with love… just for you?

Must be nice to be special

snorkel · April 20, 2024, 12:35pm

Can be poisonous too. So better have one test subject, than killing all population with a new product.

nerdatwork · April 20, 2024, 12:56pm

He is ONE of a kind, kind human being. He’s one of the few who has been around for a long time. He has read the whitepaper, suggested many solutions for problems that we as SNOs faced. He also is polite and a thorough professional when passing on the errors to the dev team rather than using words like “scam” or Storj is “this and that” and how easy it is to “fix” certain things or I have N number of years experience as a “coder”.

TLDR;
Victoriabea4 Precious GIF - Victoriabea4 Precious Valued - Discover & Share...

Alexey · April 20, 2024, 1:08pm

Oops. Seems I meet all requirements?!
But I still have no idea, how is it working…
Ok.
We will wait for the next BF (v 1.101.x is a mandatory requirement).

nerdatwork · April 20, 2024, 1:14pm

Michael Scott Absofruitly GIF - Michael Scott Absofruitly Yes - Discover &...

Awesome Alexey is the Standard.
- ISO 9001, ISO 27000 certified.

BrightSilence · April 20, 2024, 2:06pm

Okay, hang on. I’m not special. @elek mentioned they would want to test with test users or volunteers and I volunteered. As can anyone else. I’m not sure why you assume you can’t just offer to test as well. At this point, @elek hasn’t even responded yet and I don’t even have a clue on whether one of my nodes will be used for testing or what they need me to do to volunteer, so your comment is very premature.

You’re very kind. And I appreciate your words, but I didn’t really know which part of your message to quote as I feel like I’m just another person here in a community of great people. Thanks though.

It certainly is a larger problem and going by the previous conversations Storj is very aware of that. It impacts all nodes with large amounts of pieces on one satellite. They are working on a solution for all nodes with the larger bloom filters sent through a new method. Hopefully that will go live soon after testing.

kocoten1992 · April 20, 2024, 2:21pm

Not that aware I don’t think, if they were, this is what they should already doing:

Run several real nodes - not seeding/fake node, but using some sort of snapshot filesystem - so they could travel back in time to test exactly what going on that that particular time. I heard they also using cockroachdb - great, cockroachdb have travelsal back in time query too, they could restore a backup at a particular time and frozen it to do extended testing.
Throwing s*** at those few nodes, eg: power down when GC is running, or at random time, when the software at most vulnerable, through time, problem will start to emerge…

Even on node I just start less than month ago, I still experience this issue, so this is certainly not a problem with large node. cc @Alexey

BrightSilence · April 20, 2024, 2:38pm

All due respect, but I have no idea what you are talking about. What we are talking about is at this point a completely known issue caused by bloom filters that are too small. The behavior with only low percentages of trash being cleaned up on my node for example roughly matches the theoretical calculation posted by elek. Larger bloom filters couldn’t be sent before due to limits in DRPC. Those have been resolved for later node versions and that’s going to be tested with a larger bloom filter.

flo82 · April 20, 2024, 2:40pm

Can s.o. answer this question?
When will docker images be updated?

kocoten1992 · April 20, 2024, 2:54pm

Hello, maybe I’m talking out of place here, but unforeseeable changes in the future would introduce unforeseeable bugs?

I’m talking about testing methodology here. If they also run a few nodes, when enough user reporting an issue, they could pulling number from their real node to see if their real node also experience that AND they could froze their environment, both on database on their server and real filesystem in node, they could quickly grasp what going on and introduce a fix much faster. That the main idea…

P/s: for example, they could use zfs for node and snapshot filesystem daily, cockroachdb on their server can also be backup and restore to another database instance, now they have complete control over their environment, they could do extended testing until they figure out the problem, this idea is not crazy right, I used to be a devops engineer, this is how I isolate problem for my SWE…

Ruskiem · April 20, 2024, 2:54pm

WHATEVER.
just send us some bloom filter for .us1 today or tomorrow pleasee, so far didnt notice any this weekend or idk.

BrightSilence · April 20, 2024, 4:11pm

I don’t disagree with that testing approach, it’s just not really relevant to the subject being discussed here. I also know they definitely run nodes, though I’m not sure if they manage them in such a way. It’s a little off topic though and I’d rather focus on helping with the larger bloom filters here as Storjlings are being responsive and I don’t want to make them wade through other discussions in this topic to get to the relevant posts. So I’ll leave it at that here.

snorkel · April 20, 2024, 9:01pm

If the bloom filter reffers to all the pieces that must be retained at the time of generation, aka t, and the node starts GC at t + x seconds, and it finishes at t + y minutes, all this time (y minutes) the node receives new pieces that are not reffered by the bloom filter. What happens to them?
They are ignored because they are newer than the bloom filter generation time (t)?
Or this is an oversight of the dev team (which I doubt)? I know that they are retained, but I’m curious about the process?