Very high CPU load - 2000 threads

moon86 · June 23, 2020, 4:55pm

Hello,

Each time the traffic increase I’m having on several nodes a very high CPU load.
Here is an example of the top result command :

top - 16:38:45 up  2:36,  1 user,  load average: 1055.97, 1055.97, 1055.50
Tasks: 156 total,   2 running,  87 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.6 us, 13.3 sy,  0.0 ni, 40.0 id, 42.1 wa,  0.0 hi,  2.1 si,  0.0 st
KiB Mem :   469192 total,     7492 free,   397996 used,    63704 buff/cache
KiB Swap:  1998844 total,   409108 free,  1589736 used.    54708 avail Mem

When I list the threads on the server, I a lot of them from storanode, here is an example :

ps -eLf | grep storagenode 

root       1933   1904  58640  0 2292 16:49 ?        00:00:00 ./storagenode run --config-dir config --identity-dir identity --metrics.app-suffix=-alpha --metrics.interval=30m --contact.external-address=MYIP:PORT --operator.email=MYEMAIL --operator.wallet=0xMYWALLET --console.address=:14002 --storage.allocated-disk-space=11TB

I have like more than 2000 threads like this one

user@servername:~$ ps -eLf | grep storagenode | wc -l
2363

And this is happening on several of my nodes, not only one. Even with servers with 4 cores.
Do you have any idea of what is wrong ?

Thank you in advance.

litori · June 23, 2020, 5:22pm

Looks like the loads are from IO Waits, not cpu.

Seems like your swap file is used a lot since you do not have a lot memory on the VM.

KiB Mem : 469192 total
KiB Swap: 1589736 used.

Maybe allocate more memory to the VMs so apps stay in memory instead of hitting the swap (which hits the disks back and forth). You can also change the swappiness to 1 after you have allocated more memory to it so swap doesn’t get used.

moon86 · June 23, 2020, 5:34pm

Yes but I’m facing the same issue with a dedicated server and 4GB of RAM.
I’ll have a look to your proposals. Thank you.

SGC · June 23, 2020, 7:19pm

IOwait is due to hdd latency it just looks like cpu usage because the cpu waits for the hdd’s to respond back before continuing with its tasks and thus slows cpu to a crawl…
no amount of cpu will help that… its a matter of the hdd’s

what kind of setup are you running?

the test data just resumed at a decent pace… so thats most likely what you are seeing choking your hdd’s, i can also see a significant increase in my iowait’s