Current situation with garbage collection

elek · April 30, 2024, 8:50am

UPDATE: we generated 10Mb bloom filters for all storagenodes.

I tested the filter on one of my storagenode.

I had 24 344 156 pieces (according to satellite), and 29 566 764 pieces files on the disk (21% overhead)

Process took ~ 1:30, and finished with deleting 3 675 550 pieces (~6% overhead, which is under the expected 10%)

2024-04-29T16:28:21Z	INFO	retain	Prepared to run a Retain request.	{"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-04-23T17:59:59Z", "Filter Size": 10000003, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-04-29T18:01:55Z	INFO	lazyfilewalker.gc-filewalker.subprocess	gc-filewalker completed	{"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "piecesCount": 29566764, "trashPiecesCount": 3675550, "piecesTrashed": 3675550, "piecesSkippedCount": 0, "Process": "storagenode"}

We are planning to send out 10Mb filter soon to all storagenodes (be sure that you use latest SN software, or auto-update).

If you have more than 29M pieces for US1 satellite (check with find -type f | wc from blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa folder), and would be happy to test earlier, let me know your NodeId. I can send out BFs manually for a few volunteers.

mgonzalezm · April 30, 2024, 2:48pm

Yes, please 124W7AHXrdWgwKceF6zPBm1LYjNc7WuEjyvUXbRDKGbMAodspiT

BrightSilence · April 30, 2024, 3:29pm

Awesome! Do we need 1.102? Cursor is at ae147ae147ae147ae147ae147ae147ae147ae147ae147ae147ae147ae147ae13 atm. Would be nice if that could finish before if that’s required.

mgonzalezm · April 30, 2024, 4:05pm

My node received the 10M bloom filter, but unfortunately it was restarted for an update while processing it and it didn’t save it.

I can see this in the log:

2024-04-30T15:39:55Z	ERROR	pieces	lazyfilewalker failed	{"Process": "storagenode", "error": "lazyfilewalker: signal: killed", "errorVerbose": "lazyfilewalker: signal: killed\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:85\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkSatellitePiecesToTrash:160\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePiecesToTrash:578\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:369\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:258\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-04-30T15:39:55Z	ERROR	filewalker	failed to get progress from database	{"Process": "storagenode", "error": "gc_filewalker_progress_db: context canceled", "errorVerbose": "gc_filewalker_progress_db: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*gcFilewalkerProgressDB).Get:47\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePiecesToTrash:154\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePiecesToTrash:585\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:369\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:258\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-04-30T15:39:55Z	ERROR	filewalker	failed to reset progress in database	{"Process": "storagenode", "error": "gc_filewalker_progress_db: context canceled", "errorVerbose": "gc_filewalker_progress_db: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*gcFilewalkerProgressDB).Reset:58\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePiecesToTrash.func1:171\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePiecesToTrash:244\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePiecesToTrash:585\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:369\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:258\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-04-30T15:39:55Z	ERROR	retain	retain pieces failed	{"Process": "storagenode", "cachePath": "config/retain", "error": "retain: filewalker: context canceled", "errorVerbose": "retain: filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePiecesToTrash:178\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePiecesToTrash:585\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:369\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:258\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}type or paste code here

pdeline06 · April 30, 2024, 4:41pm

And me please
123nyNsQeE3HBwvpKK91uRCZ2y3bgG62asfur76fFumKQ2wbGBZ

elek · April 30, 2024, 4:53pm

Fortunately the command is in my shell history. Just sent it out again…

1.101 should be enough (this commit is required: https://review.dev.storj.io/c/storj/storj/+/12537)

I sent out BFs for the 3 nodes (thanks for volunteering!). I think it should be enough for now.

If there is no suspicious signs, we will send out BFs for everybody on Thursday (Wednesday is May Day)

BrightSilence · April 30, 2024, 10:37pm

10MB bloom filter successfully processed. It definitely cleaned up more than before, but it left way more than 10% behind unfortunately.

2024-04-30T16:33:38Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-04-23T17:59:59Z", "Filter Size": 10000003, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-04-30T21:45:51Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 7775774, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 34133480, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "5h12m12.637650404s", "Retain Status": "enabled"}

Prior to the run I had about 1.85TB of uncollected garbage. But it left about 592GB behind.

In your PM you mentioned the satellite has 22,929,122 pieces for this node. My node saw 34,133,480. So ideally it would have deleted 11,204,358 pieces. But it only removed 7,775,774 for about 69% (nice). Leaving 31% behind.
Side note: 592GB/1.85TB = 32% indicating that the way I derive uncollected garbage in the earnings calculator is quite accurate.

Are larger sizes than 10MB planned? If we’re bumping up to the limits of that already, it seems insufficient long term.

jammerdan · May 1, 2024, 2:32am

Very well added. It needs more long-term anticipation and resources for code development on the storagenode side.

elek · May 1, 2024, 7:34am

That is somewhat less what I expected. I just checked, and the expected false-postitive-rate is 18.6 % with 10Mb BF for 22.9M pieces. It’s not an exact number, but probability, still the 31% is higher…

This is my opinion, but I would focus first on adopting 10Mb everywhere.

Bumping further the BF size is certainly an option (but after a certain size, when we hit the hardware limitation, we need some more code optimization.)

So it’s an option, but not the only option. The hardware of BF generation is bumped, and it might be possible to generate BF more frequently (once per 4-5 days for every satellite?).

We use different seed for every BF generation, which means that in the next round an other 18.6 % will be delete from your node (not the same 18%). Which means that during the week, you may have way less chance to have garbage (first round: 82% deleted, next round: 82% of the remaining + 82% of the new garbage).

More frequent BF may cause higher disk utilization, but it would delete garbage earlier (files will be trashed earlier after the deletion).

This is my opinion, but I wold consider to generate BF more frequently…

BrightSilence · May 1, 2024, 8:09am

I admittedly don’t know exactly how much it would normally deviate from the expected percentage by random chance. But that kind of feel like its significantly higher, like outside of the realm of reasonable chance.

Actually, come to think of it. The formula to calculate the false positive rate you posted earlier included the number of garbage pieces as well.
Shouldn’t it be: 2^(10000003÷34133480÷−1,44×8)=32%?

That number makes more sense, though I don’t really know how it could be that the amount of garbage could impact the match rate as this isn’t known when creating the bloom filter.

That would certainly help, but this is not a long term fix. As the number of pieces grows the frequency would have to rise exponentially to compensate.

Ambifacient · May 2, 2024, 8:54pm

10MB bloom filters have made their way from US1

2024-05-02T13:48:19-07:00	INFO	retain	Prepared to run a Retain request.	{"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-04-23T17:59:59Z", "Filter Size": 10000003, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}

-rw-r--r-- 1 root   root  9.6M May  2 13:48 ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa-1714154399992924000

Toyoo · May 2, 2024, 9:47pm

I’m worried that the optimal number of hashes for nodes with too many pieces is overestimated due to not recalculating it after truncating filter’s size.

Let’s consider a node that contains 20 TB of data at a current average piece size of 252 kB. This means than the number of pieces it stores is:

n = 20 TB / 252 kB = 79365079

Going by the procedure that estimates the optimal number of hashes:

bitsPerElement  := -1.44 * math.Log2(falsePositiveRate) → 4.783576
hashCountInt := int(math.Ceil(bitsPerElement * math.Log(2))) → 4

giving 4. But this number would be optimal only if the bloom filter size were (going by Wikipedia):

m / n = -2.08 ln(𝜀) = -2.08 * ln(0.1) = 4.789377
m = 4.789377 * n → 47 MB

However, you then change the filter size without adjusting the number of hashes.

For a filter of 10 MB and 4 hashes we have 92% false positive rate.

For a filter of 10 MB and 1 hash, which is the optimum in this case, we have 63% false positive rate.

(btw, I love the footer of that Bloom Filter calculator!)

elek · May 3, 2024, 5:29am

Yes, we are sending out 10Mbyte BFs. We will check if the majority of the SNs have at least 1.101. If no other problems, we may use only 10Mbyte in the future (or bigger )

elek · May 3, 2024, 5:39am

Nice calculator. But why do you use 79365079 as the number of items? We had 22.9M segments on the node…

With 4 hash function it’s ~20%, which is what we expected:

snorkel · May 3, 2024, 12:27pm

When 10MB BF for EU1? To watch the logs…

Toyoo · May 3, 2024, 12:53pm

Ah, sorry, I somehow missed the number of pieces given, I just took a guess. 20 TB is a nice round number still within the recommended size of a node, and if such node had a true random sample of all pieces in the network, that would be the number of pieces it would store.

For BrightSilence’s node, for 4 hashes by that calculator we get a false positive rate of 21.6%, while for 2 hashes it is reduced to 19.0%. For this node the difference is small, but the effect compounds after few iterations.

Given we observe that the average piece size is smaller and smaller, and nodes keep growing, it might still be worth making this small change in code to re-estimate the optimal number of hashes.

snorkel · May 4, 2024, 1:47am

Finnaly the new 10MB BF was processed. This is my oldest node, 40 months, Synology, 18GB RAM, Exos drive - almost 24h . Ready for the next one.

2024-05-03T01:17:10Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-04-23T17:59:59Z", "Filter Size": 10000003, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-05-04T00:42:33Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 8841137, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 41333620, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "23h25m22.955227817s", "Retain Status": "enabled"}

The Used Space finnaly looks good, coming down from 12.4TB.

Toyoo · May 4, 2024, 3:48pm

Oh, that’s a nice number.

Going from k=3 (recommended for the optimal node size) to k=1 here reduces the false positive rate from 48.9% to 40.3%.

elek · May 4, 2024, 6:20pm

Totally agree. Fair point.

As far as I see the difference between the current calculation and the n/m calculation is only 1-5 %, but we need those percentages…

snorkel · May 4, 2024, 7:57pm

My biggest nodes finished retain. 7 machines running 2 nodes, 1 running 1 node.

c21 - 1GB RAM
2024-05-03T02:38:19Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 497512, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 14971763, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "4h54m57.465579453s", "Retain Status": "enabled"}
c22 - 1GB RAM
2024-05-03T11:24:50Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 870972, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 16702227, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "6h50m56.643776581s", "Retain Status": "enabled"}
o11 - 10GB RAM
2024-05-04T11:56:39Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 8920068, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 39679616, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "38h13m18.232162572s", "Retain Status": "enabled"}
p11 - 18GB RAM
2024-05-04T00:42:33Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 8841137, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 41333620, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "23h25m22.955227817s", "Retain Status": "enabled"}
b11 - 18GB RAM
2024-05-04T07:29:39Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 10111171, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 52834911, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "32h36m50.416953082s", "Retain Status": "enabled"}
c11 - 18GB RAM
2024-05-03T20:01:26Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 8309277, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 38004048, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "19h28m55.526847405s", "Retain Status": "enabled"}
g11 - 18GB RAM
2024-05-03T19:46:26Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 9021317, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 45557230, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "24h40m13.705897155s", "Retain Status": "enabled"}
o21 - 18GB RAM
2024-05-04T00:25:33Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 9798363, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 46085859, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "27h52m2.24455936s", "Retain Status": "enabled"}
r11 - 18GB RAM, space used 11.02TB, trash 1.74TB, sat report 10.2TB
2024-05-03T23:02:28Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 9784271, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 45714626, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "28h54m40.065068586s", "Retain Status": "enabled"}

I put the dashboard data for the last one as reference. This is now, after retain.
RAM has the biggest influence on walkers speed, including used-space and retain.
You can see the difference between o11 and p11, which have almost the same piece count and pieces removed: 38h/10GB and 23h/18GB. p11 has 2 nodes, o11 has one node.