What happened to the GC this week?

There wasn’t a single GC run this week according to the inbuilt prometheus metrics:

Usually my node dies on Wednesdays/Thursdays because of the GC, and there’s always an 8-9 hours run on every Sunday morning.

oh, how do you monitor GC?

Why is it dies? With what error?

Hi @raert, is this the same issue that happened last time (11th June) with the storagenode binary?

My nodes died last week Monday and today exactly at the same time (2023-09-25T13:30:00Z) with the same errors as described by you in the other thread:

2023-09-25 03:35:12,662 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-09-25 03:35:13,663 WARN killing 'storagenode' (57) with SIGKILL
2023-09-25 03:35:15,666 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-09-25 03:35:18,670 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-09-25 03:35:21,674 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-09-25 03:35:23,676 WARN killing 'storagenode' (57) with SIGKILL
2023-09-25 03:35:24,678 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-09-25 03:35:27,682 INFO waiting for storagenode, processes-exit-eventlistener to die

I wasn’t able to kill the container, only a reboot of the whole system helped. Since this happened 2 weeks in a row at exactly the same time, I don’t think it was a coincidence.

1 Like

so there was a capacity issue with the database used by GC. We have now scaled up the storage and will continue to run GCs manually.

4 Likes