Node uses 100% CPU - how to debug

one of my nodes is grabbing 100% (1 cpu) all the time:

image

No filewalkers, trashwalkers or anything is running.
Disk has iowait of ~20% - so no pressure here.
Up/Download is the same as my other nodes - so also nothing special.

How to debug this? I have the debug server enabled - but how to search for the problem?

I would start with capturing trace and staring at a flamegraph.

i did.

every 10 second a thread is started with “millions” of pread64s and at the end a pwrite64 to the sqlite database. Then the thread is killed. This loops forever.

I did not identify which database was wrong (all integrity checks were ok). I deleted all databases and let them recreate like described here: https://support.storj.io/hc/en-us/articles/4403032417044-How-to-fix-database-file-is-not-a-database-error

Now problem is gone. Node is working as expected.

3 Likes

Perhaps it was either a TTL database or the piece usage databases.

I left the piece_expirations folder and files intact. Was that a goog idea or should I also delete them?

I think it’s a good idea, otherwise the expired pieces would be moved to the trash by the garbage collector instead of a direct deletion.

1 Like