Current situation with garbage collection

Ambifacient · April 20, 2024, 10:05pm

The bloom filter will only trash pieces with a modification time before its creation time.

See the following code snippet: storj/storagenode/pieces/filewalker.go at 8d6eed6169659aee3388ee68cbff6b1cb4a69c54 · storj/storj · GitHub

pdeline06 · April 21, 2024, 8:03am

Why isn’t the trash emptying?
All nodes have garbage in these folders. 2-3 terabytes each.
\trash\ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa\2024-04-13
\trash\v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa\2024-04-14
7 days have passed.

Mitsos · April 21, 2024, 9:59am

13 + 7 = 20 => if trash deleter says “I need to delete older than 7 days” and runs on the 20th, those files aren’t supposed to be deleted. They will be deleted on the run on 21st.

For the 14th they will be deleted on the next run that meets the requirements: ie on the 22nd.

For the past 7 days:

littleskunk · April 21, 2024, 10:33am

It sounds bit like you are misunderstanding our careful approach. We tested the bigger bloom filters with storj-up. But why should we rush the rollout now? A good test is not a replacement for a good rollout strategy. → Send the bigger bloom filters to a few volunteer nodes first to double check our test results.

Alexey · April 21, 2024, 10:42am

We have several real nodes - almost any employee running at least a one Pi node… To be aware of problems which can affect the Community, I’m not an exception. I run three - Windows one and two docker for Windows nodes. I also has had a pi one, but the SD card died and my node with it - I’m far away from this location to fix the issue (it’s simple though - re-flash the SD card and run the node, but requires a physical access, which is not possible in the current situation).

Alexey · April 21, 2024, 10:46am

Likely - yes. The BF is refer only to past pieces (about a week ago old).

kocoten1992 · April 21, 2024, 10:47am

Hi, let move conversation about the new method approach to this topic Suggestion on testing on production method

What you are doing is correct, I’m suggesting a new method approach testing on production though.

Alexey · April 21, 2024, 10:48am

Oh, this is an another chore. It’s not related to GC and BF, it will simple wipe off anything what’s older than 7 days after moving to the trash.

Ambifacient · April 21, 2024, 2:50pm

Did the bloom filters get sent out this week? Not sure if it’s just me, none of my nodes have received the filter from any satellite. All my nodes are on v1.101.3.

Alexey · April 21, 2024, 3:25pm

It should be, at least my nodes received it:

2024-04-18T10:58:21Z    INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"Process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Process": "storagenode", "piecesCount": 854359, "piecesSkippedCount": 0}
2024-04-18T10:58:21Z    INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"Process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}

nerdatwork · April 21, 2024, 3:38pm

Does PiecesContentSize refer to size of the pieces in bytes ?

Alexey · April 21, 2024, 3:52pm

It should, did you see a discrepancy?

Ruskiem · April 21, 2024, 5:43pm

checking 1 of mine nodes (1.99.3):
Yes, for that 1 satellite ending for “A6” at 18-04-2024
got INFO retain prepared and succesfuly moved 2 times!
at 4:05 and 2nd time ~ 12:00

but thats it!
from 18-04-2024 to today,
only for 1 satellite, and 2 times.
no other retain for other satellites ?

EDIT: 22.04.2024
they received now, all nodes are spinning 100% atm

Roxor · April 21, 2024, 11:12pm

I know I’m replying about 8h after your post: but I’m seeing a gradual decrease of diskspace-used. Maybe they came out?

Ambifacient · April 21, 2024, 11:56pm

Yes I have started to received filters for US1. I guess we’ll see EU1 tomorrow. The bloom size has gotten slightly smaller at 4099953.

However the deployment from v.1.99.3 to v1.101.3 is a lot faster than it was for v.1.95 to v.1.99 so hopefully the larger bloom filters will be standard soon.

Also moving to trash immediately and not after gc-filewalker completes is a great change!

Alexey · April 22, 2024, 5:07am

7 posts were split to a new topic: Name of trash folder is not corresponding the actual time of collection

elek · April 22, 2024, 7:43am

As a rule of thumb: don’t do downgrade. It’s usually not tested, and not safe.

It may happen very rarely, when we stop and rollback rollout. And it’s always scary.

During normal operation your node shouldn’t go back to 99 from 101. (rollout uses the nodeid to return with the same version all the time, until the seed is changed)

In this particular case: I assume 99 won’t see the new style of trash folders, they will remain there. I am not sure if migration to new structure will work, if the process tries to migrate again, and new directories are already created. But may be…

elek · April 22, 2024, 7:46am

That’s a good news. It means that you don’t need the full bloom filter (you have less pieces). Which means that your false positive rate should be 0.1, as expected.

4100003 is the max limit, it is used only if the BF suggests a bigger optimal size. With a few millions of pieces, you may have smaller size. That’s normal.

snorkel · April 22, 2024, 8:03am

I don’t downgrade manualy. It’s just happening when I change something in run command and I recreate the container. But luckely, all my nodes didn’t go back and stayed at 101 after restart.

BrightSilence · April 22, 2024, 11:39am

Unfortunately not the case for the node I’ve been monitoring for this. It seems this run cleaned up even less (which is to be expected since the node still receives new data, so the max size will get less sufficient over time). I won’t bother you too much, so I’ll just post my results after the last run, which still left more than 2TB of uncollected garbage. Let me know if you need my node ID to test the large bloom filters. This node has been updated to v1.101.3.