Current situation with garbage collection

I do not think so…
But, yes, you can run it manually (not sure could it help?):

However, I’m not sure that it could help with a BF, because it’s sent from the satellites… and if your node version doesn’t support to store it locally… well, it will be forced to wait for the next one.

2 Likes

So, ipothetically, if a node is verry big and to slow to finish a GC, and the next BF comes in, the GC starts again and so no, and it never finishes.
I guess this can be solved maybe if the GC does partial deletes and dosen’t wait to finish the run to move the garbage all at once to trash. This way, even if it dosen’t finishes, in time, there will still be pieces moved to trash, and only a small percentage will be kept from the ones that should be trashed. Maybe some dev can shed some light on this case.

I think this is what will happen:

1 Like

I would guess, but you have two options:

  1. Wait until the proper GC will be implemented for a big nodes
  2. Reduce the allowed amount to be used.

I do not see any other solution so far (I do not have any node which can be exclusion so far).

This change should improve things!
You will have a one GC filewalker, which must move pieces to the trash instead of TWO:

  • gc-filewalker which consider what is could be moved to the trash;
  • AND the retention filewalker, which is ACTUALLY moved pieces to the trash.

So, it must improve things

I can’t see something like that here: storagenode/retain: add more logging to GC filewalker · storj/storj@8a7b305 · GitHub or here storagenode/{pieces,blobstore}: save-state-resume feature for GC file… · storj/storj@0f90f06 · GitHub or here storagenode/pieces/lazyfilewalker: more logging to GC filewalker · storj/storj@780df77 · GitHub

To distinguish a resume from a fresh start and to monitor resume is working as expected a log entry would be good.

2 Likes

Im seeing the exact same numbers on my big nodes i probably have over 100tb unpaid data accross all nodes, not impressed at all by storj calculations. Its a joke at the current state for the operators.

3 Likes

+1

(Post must be at least 20 characters)

3 Likes

Shouldn’t gc’s failure be logged in INFO level than DEBUG level ?

log.Debug("gc-filewalker failed", zap.Error(err))
1 Like

I would log it as warning or error because if it always fails, it will affect the functionality of the node…

Can you please share the number of pieces files (or used blobs space) before and after?

Looks like you have 11.13 TB blobs usage after GC run, which deleted 2890475 of files. Adn all together you have 39M

With 39M pieces, the false positive rate is 44% (66 % o the trash couldn’t be found).

But we use different seeds for different BF, so multiple BF runs will clean it up.

Agree. We are following the progress of the rollout. Big nodes are usually up-to-date, so probably we can start using new filters soon…

1 Like

Those two numbers don’t add up to 100%. I’m guessing it should be 34% removed vs 66% that couldn’t be cleaned up. My quick and dirty calculation looked a little worse, but to be fair, I did assume all garbage was from US1 in that calculation. Still even if 44% gets cleaned up, that’s not exactly a matter of “multiple runs will get it”. With that percentage, there would always be a significant amount of garbage. I’ll keep an eye out for the updates. Hopefully you decide to send out the larger BF’s after the rollout is complete.

1 Like

Yes, and we hope to solve it soon.

2 Likes

I understand and I’m not complaining. I was just responding to this line.

I should probably have quoted it in my previous message. But I don’t agree with that assesment. Multiple runs won’t be enough to clean it up, especially when new deletes keep coming in.

But I think you are all aware of that and working on a better solution, so the point might be moot.

Well, it depends on the number of new deletions. Because I love the technical details, let me add some more information :wink:

Here is a simple simulation with spreadsheet:

Let’s say you have 25M real blobs, and 14M blobs which should be deleted (which is very rare, and bad situation).

With 39M (25+14) blobs, the false positive rate of a 4100003 bytes BF is 66%.

The 25M will always kept (guaranteed by the BF), but only the 34% of the 14M will be properly deleted.

So we will have 24M + 14M * 0.34 + newly deleted blobs for the next BF.

Here the code has a nice trick. Next time a different seed is used for BF, so the pieces which are affected by false-postitives hits, will be different. There is good chance that some of the missed pieces will be deleted this time.

As you see in the garbage column, the original 14M is slowly decreasing (it requires 10 BF execution to get only 269k as long as only 100k pieces are deleted during one BF period).

And yes, it depends on the number of new deletions, but even with 1M new deletions, it will work eventually…

That’s the math. We have hope.

But there is no question: we should use the new protocol ASAP, as 66% false positive rate is unacceptable (even if it’s 0.66 * 0.66 at the second time).

Note: false-postitive rate is calculated with the formula =POW(2,4100003/sum(E2:F2)/-1.44*8)

5 Likes

I didn’t mean to put you up to so much work with my comment. :laughing:
Of course I agree with all that and you’re right. I am aware of how the system works including the random seed. But before I could respond there is already a comment from someone else showing pretty clearly that that isn’t good enough. Time matters and deletions of free account data are still planned as well, so more large deletions are expected.

I for one don’t need to be convinced that you are working hard on better solutions either and I don’t discount the value of this stop gap solution. Anything that frees up some space will help as it gives full nodes new room to grow. That helps a lot as a temporary solution. But it doesn’t stop me from looking forward to the full solution soon.

Anyway, keep up the good work and sorry if I took up your valuable time by sparking that calculation. :wink: I personally don’t appreciate when someone brings an argument without calculations and expects detailed numbers in response. So that wasn’t my intention, but your efforts are appreciated anyway.

I do love having this little nugget of knowledge though. :grin:

1 Like

I don’t know if GC run time is an issue for now, I know only about FW. But, I will throw this ideea to you anyway…
What if we store pieces in daily directories?
It will make GC scan only those directories that corespond to date of creation for the pieces that must be kept/ deleted…
Because we know that old data remains untouched for long time, and the majority of deletes are from the recent times. Why scan old untouched directories? The pieces that must be deleted will be concentrated in a few recent directories.
Directory structure variants:

  1. /data/US1/2024/12/31/… pieces
  2. /data/US1/2024-12-31/… pieces

The UTC can be used to choose the right time and GC can check the actual folder and the next, for pieces that slipped in the next day.

wouldn’t you run into filesystem issues with loads of folders?

2 Likes

Creation dates are going to be all over the place, and have almost zero relation to what customers choose to delete at any given time. And… any information about dates is space that would get taken up instead of the normal BF details (as we’re already bumping up against max BF sizes).

So you’d reduce the precision of BF data (by having less room to include it)… and instead you’d have more date data: which is going to be near-random anyways?

1 Like

So I get that it’s a bad ideea. :sweat_smile: