How often can I expect GC to run?

Climbingkid · June 13, 2024, 7:32pm

Hi

I have trawled the forums, and still a little unclear on what triggers garbage collection. With all the test data, I have 75% of my storage in limbo, waiting to be moved to trash for over a week.

I thought I read its every 7 days? Only its been more than that and its still mounting up - is there anything I can do to trigger it or check when it plans to run?

June 2024 (Version: 14.0.0)                                             [snapshot: 2024-06-13 19:03:38Z]
REPORTED BY     TYPE      METRIC                PRICE                     DISK  BANDWIDTH        PAYOUT
Node            Ingress   Upload                -not paid-                       10.97 TB
Node            Ingress   Upload Repair         -not paid-                        4.40 GB
Node            Egress    Download              $  2.00 / TB (avg)               33.48 GB       $  0.07
Node            Egress    Download Repair       $  2.00 / TB (avg)                2.72 GB       $  0.01
Node            Egress    Download Audit        $  2.00 / TB (avg)              210.18 KB       $  0.00
Node            Storage   Disk Current Total    -not paid-             4.96 TB
Node            Storage              ├ Blobs    -not paid-             4.95 TB
Node            Storage              └ Trash  ┐ -not paid-            11.64 GB
Node+Sat. Calc. Storage   Uncollected Garbage ┤ -not paid-             3.10 TB
Node+Sat. Calc. Storage   Total Unpaid Data <─┘ -not paid-             3.11 TB
Satellite       Storage   Disk Last Report      -not paid-             1.85 TB
Satellite       Storage   Disk Average So Far   -not paid-           810.43 GB
Satellite       Storage   Disk Usage Month      $  1.49 / TBm (avg)  333.05 GBm                 $  0.50

Many thanks

CC

Ambifacient · June 13, 2024, 9:01pm

The garbage collection is triggered by Storj’s end, nothing we can do to start one.

Note that the satellite reported used space values have been flaky recently so the calculator may be using unreliable data.

Climbingkid · June 13, 2024, 9:23pm

Thanks - based on the ingress, and known short lived data from SL its beleivable as I have been watching it increase.

If I understand this correctly then, I have 75% of my allocated storage taken up with data I am not getting paid for, and is not being cleared out by GC prompty. Surely this is going to cause issues - if this was a full drive in this state, and this was preventing me taking paying data this would be a bad situation.

I think if the plan is to move to short TTL data then GC has to be more frequent

Thanks
CC

Toyoo · June 13, 2024, 9:25pm

TTL data don’t need GC, TTL data will be deleted by node automatically when TTL expires without any prompt from satellites.

Alexey · June 14, 2024, 4:21am

This could be a case only when the missed data would be backfilled from the satellites side (US1 and SLC in particular), and when it happen you would still have a discrepancy between the physically used space on the disk (not on the dashboard) and a usage reported by satellites (Average Disk Used Space This Month).

The discrepancy between the used space on the dashboard (piechart) and the usage on the disk meaning that your databases are not updated with the correct usage. This can be fixed by enabling a scan on startup (it’s enabled by default) and restarting the node. You need to wait until used-space-filewalker will be successfully completed for all trusted satellites.
You also need to remove data of the untrusted satellites, if you still have it:

There is no such a plan. TTL was here since the alpha. You always will have data with TTL or without, they are not a mutual exclusive.
For example, if you deleted databases, the info about TTL will be lost, so this data then will be collected by GC.

Climbingkid · June 14, 2024, 7:03am

@Alexey Thank you, but I am now more confused than before.

I guess I assumed this was uncollected trash as that is how its labelled in the earning calculator script that I posted above.

This is a new clean node - about 6 weeks old, so I do not beleive I should have anything from untrusted satelites, as I started after this change.

I have restarted the node and a grep for “filewalker” returned success

2024-06-14T06:55:33Z    INFO    lazyfilewalker.trash-cleanup-filewalker subprocess finished successfully

Still I have 3.3TB of uncollected garbage or 60% of my storage, according to the earnings calc script. Which matches what the dashboard says in terms of used space. Thing is this used space in the dash is misleading, as most of it appears to be unpaid. I worry I am quickly filling up the allocated storage, and in another week, disk will be full - and little will attract payment.

All help greatly appreciated here. Anything else I can try or look at?

Many thanks
CC

Alexey · June 14, 2024, 7:08am

You need to check the finishing of all filewalkers for all 4 trusted satellites. Each satellite is an independent entity.

Climbingkid · June 14, 2024, 9:25am

@Alexey, another look at the log, and yes I have successful completion for 4 seperate satellites.

Thanks
CC

Alexey · June 14, 2024, 2:45pm

How you usage is looks like now?
Did it move the garbage to the trash? (it’s a separate filewalker…)

Climbingkid · June 14, 2024, 3:01pm

@Alexey - no change.

My bandwidth has been maxed out for days - 12TB in - dashboard reports 5.21TB used, trash is a few GB. Available space is disappearing fast. Earning calc reports 3.28 in uncollected garbage.

No idea what the real story is here. How much storage has StorJ used here? Whats worrying is the payout information on the dashboard suggests 381GB disk usage for the month - which I know is an average, but the maths just doesnt add up.

June 2024 (Version: 14.0.0)                                             [snapshot: 2024-06-14 14:07:03Z]
REPORTED BY     TYPE      METRIC                PRICE                     DISK  BANDWIDTH        PAYOUT
Node            Ingress   Upload                -not paid-                       11.80 TB
Node            Ingress   Upload Repair         -not paid-                        7.05 GB
Node            Egress    Download              $  2.00 / TB (avg)               45.16 GB       $  0.09
Node            Egress    Download Repair       $  2.00 / TB (avg)                2.72 GB       $  0.01
Node            Egress    Download Audit        $  2.00 / TB (avg)              238.59 KB       $  0.00
Node            Storage   Disk Current Total    -not paid-             5.24 TB
Node            Storage              ├ Blobs    -not paid-             5.23 TB
Node            Storage              └ Trash  ┐ -not paid-            11.64 GB
Node+Sat. Calc. Storage   Uncollected Garbage ┤ -not paid-             3.27 TB
Node+Sat. Calc. Storage   Total Unpaid Data <─┘ -not paid-             3.28 TB
Satellite       Storage   Disk Last Report      -not paid-             1.95 TB
Satellite       Storage   Disk Average So Far   -not paid-           869.25 GB
Satellite       Storage   Disk Usage Month      $  1.49 / TBm (avg)  381.94 GBm                 $  0.57
________________________________________________________________________________________________________+
Total                                                                381.94 GBm  11.85 TB       $  0.66
Estimated total by end of month                                      869.25 GBm  26.17 TB       $  1.51

So questions still stand im afraid - if this is indeed used and expired data when can I expect it to be freed up again?
What is the current storage metric that attracts payment?
a) Is it 5.21TB Used as per dashboard? (I know it probably isnt this if data has expired)
b) Is it 1.95TB as reported in teh earning calc?
c) Is it 381GB as reported in the payout dashboard?
or None of the above.

Thanks

CC

pangolin · June 14, 2024, 3:04pm

Wasn’t this changed with this update?

github.com/storj/storj

storagenode/pieces: trash pieces immediately in GC Filewalker

committed 08:54PM - 05 Mar 24 UTC

profclems

+250 -45

Previously, GCfilewalker only finds the trash pieces and returns them for the re…tain service to trash them. This behavious has cons: if the GC filewalker fails or the node is stopped before the filewalker completes, no piece will be trashed and the found trash pieces are lost. This is a big problem for larger nodes that might not complete GC filewalker earlier before it is auto-updated. In the patch, the filewalker trashes the piece immediately when found. For the lazyfilewalker, when a trash piece is found, the pieceID is piped to the main process so that it will be trashed. Note that we do not let the subprocess to trash it because we want to be able to update the space usage cache. Once we are done with the lazyfilwaker optimization, we might come back to this and let the subprocess remove the trash piece by itself. Updates https://github.com/storj/storj/issues/6803 Change-Id: I40b52e3e940a398ecc46d0d1c448f098173f5f99

Alexey · June 14, 2024, 3:07pm

Some bugs like average disk used space this month would be fixed anyway (it’s a reporting issue on the satellites, US1 and SLC in particular, see Avg disk space used dropped with 60-70%), some - are filewalker related (but perhaps not a bugs, but errors, check your logs).
At the moment I can suggest only to use your df --si -T as a point of true. Or wait until all filewalkers will finish their job for each trusted satellite. There are 4 more, not only used-space-filewalker, did you know?

Climbingkid · June 14, 2024, 6:17pm

So I can now see that 3 of the 4 satalites completed, the 4th SLC, started and is still running - slowly… I can see see trash going up slowly also, supprised its not faster given the server specs but thats another thing to explore.

Why has it got so far out of step though to teh tune of 3.5TB in just a few days, surely we should not need to restart the node often just to recover space - should we? Is this not a process that should be running regularly to keep GC work managable so its not taking days?

Thanks
CC

Climbingkid · June 15, 2024, 8:06am

Update - “used space filewalker” has now completed on all satelites and I have success messages from the logs.

I still have a huge disconnect between the dashboard and the earnings calculator. With the filewalker now finished, can I trust this dashboard? Earnings calc is still saying 2.5TB in uncollected garbage. Nothing aligns still. Where does the earnings calc get its info?

Node            Storage   Disk Current Total    -not paid-             7.51 TB
Node            Storage              ├ Blobs    -not paid-             4.99 TB
Node            Storage              └ Trash  ┐ -not paid-             2.52 TB
Node+Sat. Calc. Storage   Uncollected Garbage ┤ -not paid-             2.49 TB
Node+Sat. Calc. Storage   Total Unpaid Data <─┘ -not paid-             5.00 TB
Satellite       Storage   Disk Last Report      -not paid-             2.50 TB
Satellite       Storage   Disk Average So Far   -not paid-           984.10 GB
Satellite       Storage   Disk Usage Month      $  1.49 / TBm (avg)  463.80 GBm                 $  0.69

Maybe someone can also explain to me as a newbie how GC is supposed to have worked, as it still seems something is broken and the node needs constant attention which was never the plan.

My understanding from the forums was that the Salt Lake test data was all TTL data, which if thats the case why did it need the used space walker to move it to trash? Or was the Salt Lake data store and deletes, in which case there seems to be the potential to disable storage nodes with trash that does not get cleared or removed in a similar time frame to its creation.

I just trying to get clear in my mind how this is all supposed to work and if there is a problem here.

All insights greatfully received

CC

Alexey · June 16, 2024, 7:47am

I guess because the lazy mode is enabled and your node accepting traffic from the customers. So there are three possible solutions:

Disable the lazy mode, all filewalkers would be run with a normal priority. This likely reduce a success rate for your node, so it will be less often offered to the customers, so it should be ok.
Wait until it finish
Set the allocation below the usage (this will stop ingress) and restart the node. This will give a higher priority for lazy filewalkers.

this is a separate issue, not related to filewalkers at all - it’s a reporting issue from the US1 and SLC satellites, see Avg disk space used dropped with 60-70%