Steadily Increasing CPU usage

LrrrAc · January 24, 2022, 6:39pm

Ive noticed since the 21st that my storage node has suddenly started increasing the cpu usage steadily as shown in this picture. Green is CPU, orange is ram. I can post logs, but they dont seem to show anything. Anyone else run into this? Ive restarted the container and the server and no change, it just restarts the increase. Normally it hovers around 15.

As you can see below it doesnt seem to ever stop increasing. Here shows before the increase started, and after I restarted the container to restart the increase.

Bivvo · January 24, 2022, 6:48pm

Hi @LrrrAc , how old and how large is your node and are you using an SMR or CMR HDD?

I had exactly the same with my SMR HDD and switched to CMR consequently.

LrrrAc · January 24, 2022, 6:56pm

I recall checking the drive a while ago and it was SMR. Its an 8TB wd white label drive. The node is ~10 monts old and 1.465TB. I dont know of anything that changed when it started increasing. I have 90 days of history with basically the same cpu usage and then it started increasing. I even updated the node version 3 days prior with no change.

LrrrAc · January 24, 2022, 6:57pm

Heres 90 days of history with it on the same drive the whole time.

baker · January 24, 2022, 7:03pm

My guess is that this usage is due to I/O wait. Can you find that stat? If it is I/O wait, restarts might make it worse for a while as the node runs the filewalker process on node start.

LrrrAc · January 24, 2022, 7:12pm

Heres the IOWait. It spikes right at the beginning of the increase, but levels back down. It only spikes again after the restart which is normal for me.

LrrrAc · January 24, 2022, 7:14pm

Heres more context.

deathlessdd · January 24, 2022, 7:16pm

Did you recently update a node?

LrrrAc · January 24, 2022, 7:18pm

I updated it to 1.46.3 3 days before the increase started. Can I downgrade to test without issue? I dont want to break my node.

deathlessdd · January 24, 2022, 7:22pm

Its alot harder to downgrade if your running though docker if you were running binary it would be pretty simple to do. But I dont see any problem with downgrading to an earlier version as theres no major updates to the storagenode itself.

But It could be that your running an SMR drive and there was some heavy data for a few days over the network.

LrrrAc · January 24, 2022, 7:27pm

Heres the same period as the previous graph. Nothin i can see that changed halfway that would cause a huge change in CPU usage. I should be just able to change the tag on the container. I just dont want to break it.

LrrrAc · January 24, 2022, 7:30pm

Just realized i said it was SMR. It was CMR. Not shingled. The good one. lol

Bivvo · January 24, 2022, 7:50pm

haha, that’s a good news at least

Andrii · January 24, 2022, 7:52pm

HI LrrrAc. Yes, it looks weird. Getting the output of /debug/pprof/profile would help to determine the problem

LrrrAc · January 24, 2022, 7:57pm

Thanks for helping! How do I get this output? Im running on linux through docker.

littleskunk · January 24, 2022, 7:58pm

LrrrAc · January 24, 2022, 8:14pm

What file extension is this? It wont let me upload it without one. Also using cat on the file outputs gibberish. https://transfer.sh/vL9fuA/profile Heres the output that I got. Let me know if I messed it up.

SGC · January 24, 2022, 10:32pm

your memory usage is way to high, that is generally only caused due to the disk being unable to keep up with the incoming writes.

unless if you are doing other stuff on the disk, it should be highly unlikely that a CMR disk wouldn’t be fast enough, but i think we did have some new highs that haven’t been reached not long ago… so i suppose it’s possible…

my money is on your disk for whatever reason cannot keep up… your memory usage is grotesque

this is a node of similar size.

then node will use more memory the slower the disk gets… pretty much, it does seem like it sometimes will use more memory for various filesystem tasks, not sure why… but i have pretty much ruled out that its only disk latency…

but in 95% of all cases excessive or increasing memory usage means the disk cannot keep up.

i’m sharing a proxmox monthly max cpu + memory graph for a 3.6TB node, having 16x threads

sure my memory usage could most likely also be much better… because my setup isn’t perfect either…

how is your storage connected? and which model is your hdd exactly

littleskunk · January 24, 2022, 10:37pm

GC in combination with atime might cause high load.

SGC · January 24, 2022, 10:42pm

yeah totally forgot to think about that…
i also disable xattr on my zfs pool to make it run better… supposedly
atime is like 1 extra io pr read/write and xattr is like 2 io… if i remember correctly.

GC?