There wasn’t a single GC run this week according to the inbuilt prometheus metrics:
Usually my node dies on Wednesdays/Thursdays because of the GC, and there’s always an 8-9 hours run on every Sunday morning.
There wasn’t a single GC run this week according to the inbuilt prometheus metrics:
Usually my node dies on Wednesdays/Thursdays because of the GC, and there’s always an 8-9 hours run on every Sunday morning.
oh, how do you monitor GC?
Why is it dies? With what error?
Hi @raert, is this the same issue that happened last time (11th June) with the storagenode binary?
My nodes died last week Monday and today exactly at the same time (2023-09-25T13:30:00Z) with the same errors as described by you in the other thread:
2023-09-25 03:35:12,662 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-09-25 03:35:13,663 WARN killing 'storagenode' (57) with SIGKILL
2023-09-25 03:35:15,666 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-09-25 03:35:18,670 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-09-25 03:35:21,674 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-09-25 03:35:23,676 WARN killing 'storagenode' (57) with SIGKILL
2023-09-25 03:35:24,678 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-09-25 03:35:27,682 INFO waiting for storagenode, processes-exit-eventlistener to die
I wasn’t able to kill the container, only a reboot of the whole system helped. Since this happened 2 weeks in a row at exactly the same time, I don’t think it was a coincidence.
so there was a capacity issue with the database used by GC. We have now scaled up the storage and will continue to run GCs manually.
Hello @raert,
Welcome back!
Unlikely.
But your local setup could suffer from the issues, see
Also chiming in, doesn’t appear that any of my nodes have received GC bloom filter this week.
this is easy to check (if you have at least info
log level):
docker logs --tail 20 storagenode 2>&1 | grep "gc-filewalker" | tail
if you redirected logs, you need to check the log file directly.
cat /mnt/storj/storagenode1/storagenode.log | grep "gc-filewalker" | tail
node3:
2024-03-15T10:45:18+03:00 INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker started {"satelliteID":"12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode", "createdBefore": "2024-03-06T17:59:59Z", "bloomFilterSize": 1040007}
2024-03-15T10:53:56+03:00 INFO lazyfilewalker.gc-filewalker subprocess finished successfully {"satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-03-15T18:29:44+03:00 INFO lazyfilewalker.gc-filewalker subprocess started {"satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-03-15T18:29:44+03:00 INFO lazyfilewalker.gc-filewalker.subprocess Database started {"satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode"}
2024-03-15T18:29:44+03:00 INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker started {"satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode", "createdBefore": "2024-03-11T17:59:59Z", "bloomFilterSize": 468621}
2024-03-15T18:30:44+03:00 INFO lazyfilewalker.gc-filewalker subprocess finished successfully {"satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-03-17T09:01:41+03:00 INFO lazyfilewalker.gc-filewalker subprocess started {"satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-03-17T09:01:42+03:00 INFO lazyfilewalker.gc-filewalker.subprocess Database started {"satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode"}
2024-03-17T09:01:42+03:00 INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker started {"satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode", "createdBefore": "2024-03-12T17:59:59Z", "bloomFilterSize": 627676}
2024-03-17T09:03:30+03:00 INFO lazyfilewalker.gc-filewalker subprocess finished successfully {"satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
node2:
2024-03-14T19:10:07Z INFO lazyfilewalker.gc-filewalker subprocess finished successfully {"process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-03-15T13:15:45Z INFO lazyfilewalker.gc-filewalker subprocess started {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-03-15T13:15:46Z INFO lazyfilewalker.gc-filewalker.subprocess Database started {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode"}
2024-03-15T13:15:46Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker started {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode", "createdBefore": "2024-03-11T17:59:59Z", "bloomFilterSize": 573881}
2024-03-15T13:21:56Z INFO lazyfilewalker.gc-filewalker subprocess finished successfully {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-03-17T08:34:59Z INFO lazyfilewalker.gc-filewalker subprocess started {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-03-17T08:35:00Z INFO lazyfilewalker.gc-filewalker.subprocess Database started {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode"}
2024-03-17T08:35:00Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker started {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode", "createdBefore": "2024-03-12T17:59:59Z", "bloomFilterSize": 1861556}
2024-03-17T09:04:04Z INFO lazyfilewalker.gc-filewalker subprocess finished successfully {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
node5:
2024-03-15T10:32:42Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker started {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode", "createdBefore": "2024-03-06T17:59:59Z", "bloomFilterSize": 648625}
2024-03-15T10:35:37Z INFO lazyfilewalker.gc-filewalker subprocess finished successfully {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-03-15T13:16:04Z INFO lazyfilewalker.gc-filewalker subprocess started {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-03-15T13:16:04Z INFO lazyfilewalker.gc-filewalker.subprocess Database started {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode"}
2024-03-15T13:16:04Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker started {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode", "createdBefore": "2024-03-11T17:59:59Z", "bloomFilterSize": 182725}
2024-03-15T13:17:11Z INFO lazyfilewalker.gc-filewalker subprocess finished successfully {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-03-17T06:29:12Z INFO lazyfilewalker.gc-filewalker subprocess started {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-03-17T06:29:12Z INFO lazyfilewalker.gc-filewalker.subprocess Database started {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode"}
2024-03-17T06:29:12Z INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker started {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode", "createdBefore": "2024-03-12T17:59:59Z", "bloomFilterSize": 250311}
2024-03-17T06:31:29Z INFO lazyfilewalker.gc-filewalker subprocess finished successfully {"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
So, seems the last one was 2024-03-17
If there was a decision to halt it, it would be nice to know:
There was no such a decision.
Looks like there was an issue with saving the Bloom Filters before they are sent out.
I am debugging the problem, and the next BF generation is already started.