Current situation with garbage collection

Alexey · April 14, 2024, 12:15pm

I do not think so…
But, yes, you can run it manually (not sure could it help?):

However, I’m not sure that it could help with a BF, because it’s sent from the satellites… and if your node version doesn’t support to store it locally… well, it will be forced to wait for the next one.

snorkel · April 14, 2024, 12:37pm

So, ipothetically, if a node is verry big and to slow to finish a GC, and the next BF comes in, the GC starts again and so no, and it never finishes.
I guess this can be solved maybe if the GC does partial deletes and dosen’t wait to finish the run to move the garbage all at once to trash. This way, even if it dosen’t finishes, in time, there will still be pieces moved to trash, and only a small percentage will be kept from the ones that should be trashed. Maybe some dev can shed some light on this case.

jammerdan · April 14, 2024, 12:50pm

I think this is what will happen:

Alexey · April 14, 2024, 12:51pm

I would guess, but you have two options:

Wait until the proper GC will be implemented for a big nodes
Reduce the allowed amount to be used.

I do not see any other solution so far (I do not have any node which can be exclusion so far).

Alexey · April 14, 2024, 12:55pm

This change should improve things!
You will have a one GC filewalker, which must move pieces to the trash instead of TWO:

gc-filewalker which consider what is could be moved to the trash;
AND the retention filewalker, which is ACTUALLY moved pieces to the trash.

So, it must improve things

jammerdan · April 15, 2024, 4:12am

I can’t see something like that here: storagenode/retain: add more logging to GC filewalker · storj/storj@8a7b305 · GitHub or here storagenode/{pieces,blobstore}: save-state-resume feature for GC file… · storj/storj@0f90f06 · GitHub or here storagenode/pieces/lazyfilewalker: more logging to GC filewalker · storj/storj@780df77 · GitHub

To distinguish a resume from a fresh start and to monitor resume is working as expected a log entry would be good.

flwstern · April 15, 2024, 6:18am

Im seeing the exact same numbers on my big nodes i probably have over 100tb unpaid data accross all nodes, not impressed at all by storj calculations. Its a joke at the current state for the operators.

jammerdan · April 15, 2024, 6:40am

+1

(Post must be at least 20 characters)

nerdatwork · April 15, 2024, 7:10am

Shouldn’t gc’s failure be logged in INFO level than DEBUG level ?

log.Debug("gc-filewalker failed", zap.Error(err))

striker43 · April 15, 2024, 7:55am

I would log it as warning or error because if it always fails, it will affect the functionality of the node…

elek · April 15, 2024, 11:25am

Can you please share the number of pieces files (or used blobs space) before and after?

Looks like you have 11.13 TB blobs usage after GC run, which deleted 2890475 of files. Adn all together you have 39M

With 39M pieces, the false positive rate is 44% (66 % o the trash couldn’t be found).

But we use different seeds for different BF, so multiple BF runs will clean it up.

Agree. We are following the progress of the rollout. Big nodes are usually up-to-date, so probably we can start using new filters soon…

BrightSilence · April 15, 2024, 12:05pm

Those two numbers don’t add up to 100%. I’m guessing it should be 34% removed vs 66% that couldn’t be cleaned up. My quick and dirty calculation looked a little worse, but to be fair, I did assume all garbage was from US1 in that calculation. Still even if 44% gets cleaned up, that’s not exactly a matter of “multiple runs will get it”. With that percentage, there would always be a significant amount of garbage. I’ll keep an eye out for the updates. Hopefully you decide to send out the larger BF’s after the rollout is complete.

Alexey · April 16, 2024, 8:33am

Yes, and we hope to solve it soon.

BrightSilence · April 16, 2024, 9:54am

I understand and I’m not complaining. I was just responding to this line.

I should probably have quoted it in my previous message. But I don’t agree with that assesment. Multiple runs won’t be enough to clean it up, especially when new deletes keep coming in.

But I think you are all aware of that and working on a better solution, so the point might be moot.

elek · April 16, 2024, 10:38am

Well, it depends on the number of new deletions. Because I love the technical details, let me add some more information

Here is a simple simulation with spreadsheet:

Let’s say you have 25M real blobs, and 14M blobs which should be deleted (which is very rare, and bad situation).

With 39M (25+14) blobs, the false positive rate of a 4100003 bytes BF is 66%.

The 25M will always kept (guaranteed by the BF), but only the 34% of the 14M will be properly deleted.

So we will have 24M + 14M * 0.34 + newly deleted blobs for the next BF.

Here the code has a nice trick. Next time a different seed is used for BF, so the pieces which are affected by false-postitives hits, will be different. There is good chance that some of the missed pieces will be deleted this time.

As you see in the garbage column, the original 14M is slowly decreasing (it requires 10 BF execution to get only 269k as long as only 100k pieces are deleted during one BF period).

And yes, it depends on the number of new deletions, but even with 1M new deletions, it will work eventually…

That’s the math. We have hope.

But there is no question: we should use the new protocol ASAP, as 66% false positive rate is unacceptable (even if it’s 0.66 * 0.66 at the second time).

Note: false-postitive rate is calculated with the formula =POW(2,4100003/sum(E2:F2)/-1.44*8)

BrightSilence · April 16, 2024, 11:18am

I didn’t mean to put you up to so much work with my comment.
Of course I agree with all that and you’re right. I am aware of how the system works including the random seed. But before I could respond there is already a comment from someone else showing pretty clearly that that isn’t good enough. Time matters and deletions of free account data are still planned as well, so more large deletions are expected.

I for one don’t need to be convinced that you are working hard on better solutions either and I don’t discount the value of this stop gap solution. Anything that frees up some space will help as it gives full nodes new room to grow. That helps a lot as a temporary solution. But it doesn’t stop me from looking forward to the full solution soon.

Anyway, keep up the good work and sorry if I took up your valuable time by sparking that calculation. I personally don’t appreciate when someone brings an argument without calculations and expects detailed numbers in response. So that wasn’t my intention, but your efforts are appreciated anyway.

I do love having this little nugget of knowledge though.

snorkel · April 16, 2024, 11:57am

I don’t know if GC run time is an issue for now, I know only about FW. But, I will throw this ideea to you anyway…
What if we store pieces in daily directories?
It will make GC scan only those directories that corespond to date of creation for the pieces that must be kept/ deleted…
Because we know that old data remains untouched for long time, and the majority of deletes are from the recent times. Why scan old untouched directories? The pieces that must be deleted will be concentrated in a few recent directories.
Directory structure variants:

/data/US1/2024/12/31/… pieces
/data/US1/2024-12-31/… pieces

The UTC can be used to choose the right time and GC can check the actual folder and the next, for pieces that slipped in the next day.

mrkeyboardcommando · April 16, 2024, 2:58pm

wouldn’t you run into filesystem issues with loads of folders?

Roxor · April 16, 2024, 3:05pm

Creation dates are going to be all over the place, and have almost zero relation to what customers choose to delete at any given time. And… any information about dates is space that would get taken up instead of the normal BF details (as we’re already bumping up against max BF sizes).

So you’d reduce the precision of BF data (by having less room to include it)… and instead you’d have more date data: which is going to be near-random anyways?

snorkel · April 16, 2024, 3:54pm

So I get that it’s a bad ideea.