It’s just because you’re using a ridiculously small node for the test!
2024-06-20T01:48:40Z INFO retain Moved pieces to trash during retain {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 7162, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 216616, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "6m54.754044s", "Retain Status": "enabled"}
2024-06-21T15:00:20Z INFO retain Moved pieces to trash during retain {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 1205, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 210036, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "2m50.3943145s", "Retain Status": "enabled"}
2024-06-22T13:55:14Z INFO retain Moved pieces to trash during retain {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 230, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 209148, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "1m9.0478738s", "Retain Status": "enabled"}
GC processed BF for only ~200k files there. And moved to trash just few thousand of them. Of course, it is not surprising that it coped with such a small simple task very quickly.
And you give it out as if everything was fine and there were no significant problems with performance.
Whereas there are HUGE performance issues with all Storj filewalkers. Including GC. On large real nodes, which now contain tens of millions of files, one complete garbage collector or used space filewalker pass takes DAYS (in worst cases - even few WEEKS), not minutes as you try to show.
Something like that for nodes with 30-90 mil files (and its only 5-15 TB of real data from current storj network):
2024-06-13T13:49:21+03:00 INFO retain Moved pieces to trash during retain {"cachePath": "D:\\Storj_Data\\Storage Node/retain", "Deleted pieces": 85549, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 2431713, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "1h2m15.6866686s", "Retain Status": "enabled"}
2024-06-13T18:50:59+03:00 INFO retain Moved pieces to trash during retain {"cachePath": "D:\\Storj_Data\\Storage Node/retain", "Deleted pieces": 2802, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 466022, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Duration": "21m31.4692929s", "Retain Status": "enabled"}
2024-06-14T10:30:12+03:00 INFO retain Moved pieces to trash during retain {"cachePath": "D:\\Storj_Data\\Storage Node/retain", "Deleted pieces": 914, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 464037, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Duration": "7m33.4334479s", "Retain Status": "enabled"}
2024-06-20T18:31:40+03:00 INFO retain Moved pieces to trash during retain {"cachePath": "D:\\Storj_Data\\Storage Node/retain", "Deleted pieces": 6772007, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 28216438, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Duration": "150h44m39.3203172s", "Retain Status": "enabled"}
2024-06-21T15:47:08+03:00 INFO retain Moved pieces to trash during retain {"cachePath": "D:\\Storj_Data\\Storage Node/retain", "Deleted pieces": 47032, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 2471054, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "36h24m47.0112751s", "Retain Status": "enabled"}
2024-06-23T05:59:17+03:00 INFO retain Moved pieces to trash during retain {"cachePath": "D:\\Storj_Data\\Storage Node/retain", "Deleted pieces": 17936, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 2450671, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "28h40m45.1656485s", "Retain Status": "enabled"}
2024-06-27T07:18:13+03:00 INFO retain Moved pieces to trash during retain {"cachePath": "D:\\Storj_Data\\Storage Node/retain", "Deleted pieces": 995449, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 19788591, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "168h57m38.5748511s", "Retain Status": "enabled"}
~50 mil files processed and ~8 mil moved from /blobs/ to /trash/.
Note that processing sat with only 460к files/pieces stored taken 7 and 21 minutes for two runs. Not so far from your data for sat with 200k files. But for satellite with 28 М files (Saltlake) it was 150 hours!
And for other sat with almost 20 М files (US1) it was 168 hours - one full WEEK for GC to finish just this one BF for one satellite alone!
Because the relation here is nonlinear - you can’t take the time for which the garbage collector processed 200k files and say that, for example for 40 million files, it will take “only” ~200 times more time(40 000k/200k = 200). Let’s say 15 hours instead of 5 minutes.
Because on small nodes, after a while, all or almost all the metadata on the stored files being very actively accessed by GC (and other filewalkers and main storagenode) end up in the OS cache in RAM and after this point processing speeds up many folds or even by orders of magnitude due to fact that GC now reads mostly from RAM cache, and not from slow disk. Whereas on large nodes, the volume of metadata is simply too big to fit in RAM caches of any reasonable size(for example above metadata size on disk is approaching ~100 GB and disk RAM cache was allocated in 5-8 GB range) and so almost all that data continues to be read from a slow HDD even with full “warm” caches. They still continue to work, just with very low hit ratio. As a result, filewalker runs takes days instead of hours expected from simple linear extrapolations from a data taken from a small node.
P.S.
Here is another GC log from another real large node. It has slightly more files and also was under high load from upload trafic during GC runs. So it taken even longer for GC to finish due to high concurrency for HDD IOP from the main storagenode process serving customer (and mostly saltlake synthetic) data uploads.
2024-06-13T09:54:18+03:00 INFO retain Moved pieces to trash during retain {"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Deleted pieces": 31404, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 400686, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Duration": "1h26m32.2431335s", "Retain Status": "enabled"}
2024-06-13T13:10:06+03:00 INFO retain Moved pieces to trash during retain {"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Deleted pieces": 95901, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 1610581, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "1h28m0.733484s", "Retain Status": "enabled"}
2024-06-13T19:31:51+03:00 INFO retain Moved pieces to trash during retain {"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Deleted pieces": 3117, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 370551, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Duration": "2h48m15.3468187s", "Retain Status": "enabled"}
2024-06-14T20:11:09+03:00 INFO retain Moved pieces to trash during retain {"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Deleted pieces": 702, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 368671, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Duration": "5h50m24.5053326s", "Retain Status": "enabled"}
2024-06-20T16:50:34+03:00 INFO retain Moved pieces to trash during retain {"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Deleted pieces": 44613, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 1627415, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "6h41m16.8723145s", "Retain Status": "enabled"}
2024-06-23T08:48:53+03:00 INFO retain Moved pieces to trash during retain {"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Deleted pieces": 17697, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 1629771, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "31h44m26.8040361s", "Retain Status": "enabled"}
2024-06-25T09:52:00+03:00 INFO retain Moved pieces to trash during retain {"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Deleted pieces": 8439223, "Failed to delete": 1, "Pieces failed to read": 0, "Pieces count": 36878846, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Duration": "263h21m57.4555898s", "Retain Status": "enabled"}
2024-06-30T09:39:55+03:00 INFO retain Moved pieces to trash during retain {"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Deleted pieces": 1196785, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 18936780, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "248h22m25.9166617s", "Retain Status": "enabled"}
I guess its now time come to add “d” (days) and “w” (weeks) chars for Duration
output in addition to “m” and “h”. “m” would be overkill. At least for now, but who knows…