What happened to the GC this week?

raert · September 24, 2023, 7:28am

There wasn’t a single GC run this week according to the inbuilt prometheus metrics:

Usually my node dies on Wednesdays/Thursdays because of the GC, and there’s always an 8-9 hours run on every Sunday morning.

Dozoris · September 24, 2023, 12:36pm

oh, how do you monitor GC?

Alexey · September 25, 2023, 2:37am

Why is it dies? With what error?

prernaparashar · September 25, 2023, 4:20pm

Hi @raert, is this the same issue that happened last time (11th June) with the storagenode binary?

striker43 · September 25, 2023, 4:39pm

My nodes died last week Monday and today exactly at the same time (2023-09-25T13:30:00Z) with the same errors as described by you in the other thread:

2023-09-25 03:35:12,662 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-09-25 03:35:13,663 WARN killing 'storagenode' (57) with SIGKILL
2023-09-25 03:35:15,666 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-09-25 03:35:18,670 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-09-25 03:35:21,674 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-09-25 03:35:23,676 WARN killing 'storagenode' (57) with SIGKILL
2023-09-25 03:35:24,678 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-09-25 03:35:27,682 INFO waiting for storagenode, processes-exit-eventlistener to die

I wasn’t able to kill the container, only a reboot of the whole system helped. Since this happened 2 weeks in a row at exactly the same time, I don’t think it was a coincidence.

prernaparashar · September 25, 2023, 5:33pm

so there was a capacity issue with the database used by GC. We have now scaled up the storage and will continue to run GCs manually.

Alexey · March 24, 2024, 10:09am

Hello @raert,
Welcome back!

Unlikely.
But your local setup could suffer from the issues, see

Ambifacient · March 24, 2024, 2:47pm

Also chiming in, doesn’t appear that any of my nodes have received GC bloom filter this week.

Alexey · March 24, 2024, 2:56pm

this is easy to check (if you have at least info log level):

docker logs --tail 20 storagenode 2>&1 | grep "gc-filewalker" | tail

if you redirected logs, you need to check the log file directly.

cat /mnt/storj/storagenode1/storagenode.log | grep "gc-filewalker" | tail

Alexey · March 26, 2024, 3:31am

node3:

2024-03-15T10:45:18+03:00       INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"satelliteID":"12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode", "createdBefore": "2024-03-06T17:59:59Z", "bloomFilterSize": 1040007}
2024-03-15T10:53:56+03:00       INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-03-15T18:29:44+03:00       INFO    lazyfilewalker.gc-filewalker    subprocess started      {"satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-03-15T18:29:44+03:00       INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode"}
2024-03-15T18:29:44+03:00       INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode", "createdBefore": "2024-03-11T17:59:59Z", "bloomFilterSize": 468621}
2024-03-15T18:30:44+03:00       INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-03-17T09:01:41+03:00       INFO    lazyfilewalker.gc-filewalker    subprocess started      {"satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-03-17T09:01:42+03:00       INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode"}
2024-03-17T09:01:42+03:00       INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode", "createdBefore": "2024-03-12T17:59:59Z", "bloomFilterSize": 627676}
2024-03-17T09:03:30+03:00       INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}

node2:

2024-03-14T19:10:07Z    INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-03-15T13:15:45Z    INFO    lazyfilewalker.gc-filewalker    subprocess started      {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-03-15T13:15:46Z    INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode"}
2024-03-15T13:15:46Z    INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode", "createdBefore": "2024-03-11T17:59:59Z", "bloomFilterSize": 573881}
2024-03-15T13:21:56Z    INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-03-17T08:34:59Z    INFO    lazyfilewalker.gc-filewalker    subprocess started      {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-03-17T08:35:00Z    INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode"}
2024-03-17T08:35:00Z    INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode", "createdBefore": "2024-03-12T17:59:59Z", "bloomFilterSize": 1861556}
2024-03-17T09:04:04Z    INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}

node5:

2024-03-15T10:32:42Z    INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode", "createdBefore": "2024-03-06T17:59:59Z", "bloomFilterSize": 648625}
2024-03-15T10:35:37Z    INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-03-15T13:16:04Z    INFO    lazyfilewalker.gc-filewalker    subprocess started      {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-03-15T13:16:04Z    INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode"}
2024-03-15T13:16:04Z    INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode", "createdBefore": "2024-03-11T17:59:59Z", "bloomFilterSize": 182725}
2024-03-15T13:17:11Z    INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-03-17T06:29:12Z    INFO    lazyfilewalker.gc-filewalker    subprocess started      {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-03-17T06:29:12Z    INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode"}
2024-03-17T06:29:12Z    INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode", "createdBefore": "2024-03-12T17:59:59Z", "bloomFilterSize": 250311}
2024-03-17T06:31:29Z    INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}

So, seems the last one was 2024-03-17

jammerdan · March 26, 2024, 8:56am

If there was a decision to halt it, it would be nice to know:

elek · March 26, 2024, 10:34am

There was no such a decision.

Looks like there was an issue with saving the Bloom Filters before they are sent out.

I am debugging the problem, and the next BF generation is already started.