Steadily Increasing CPU usage

Ive noticed since the 21st that my storage node has suddenly started increasing the cpu usage steadily as shown in this picture. Green is CPU, orange is ram. I can post logs, but they dont seem to show anything. Anyone else run into this? Ive restarted the container and the server and no change, it just restarts the increase. Normally it hovers around 15.

As you can see below it doesnt seem to ever stop increasing. Here shows before the increase started, and after I restarted the container to restart the increase.

Hi @LrrrAc , how old and how large is your node and are you using an SMR or CMR HDD?

I had exactly the same with my SMR HDD and switched to CMR consequently.

I recall checking the drive a while ago and it was SMR. Its an 8TB wd white label drive. The node is ~10 monts old and 1.465TB. I dont know of anything that changed when it started increasing. I have 90 days of history with basically the same cpu usage and then it started increasing. I even updated the node version 3 days prior with no change.

Heres 90 days of history with it on the same drive the whole time.

My guess is that this usage is due to I/O wait. Can you find that stat? If it is I/O wait, restarts might make it worse for a while as the node runs the filewalker process on node start.

1 Like

Heres the IOWait. It spikes right at the beginning of the increase, but levels back down. It only spikes again after the restart which is normal for me.

Heres more context.

1 Like

Did you recently update a node?

I updated it to 1.46.3 3 days before the increase started. Can I downgrade to test without issue? I dont want to break my node.

Its alot harder to downgrade if your running though docker if you were running binary it would be pretty simple to do. But I dont see any problem with downgrading to an earlier version as theres no major updates to the storagenode itself.

But It could be that your running an SMR drive and there was some heavy data for a few days over the network.

Heres the same period as the previous graph. Nothin i can see that changed halfway that would cause a huge change in CPU usage. I should be just able to change the tag on the container. I just dont want to break it.

Just realized i said it was SMR. It was CMR. Not shingled. The good one. lol

haha, that’s a good news at least :wink:

HI LrrrAc. Yes, it looks weird. Getting the output of /debug/pprof/profile would help to determine the problem

Thanks for helping! How do I get this output? Im running on linux through docker.

What file extension is this? It wont let me upload it without one. Also using cat on the file outputs gibberish. Heres the output that I got. Let me know if I messed it up.

your memory usage is way to high, that is generally only caused due to the disk being unable to keep up with the incoming writes.

unless if you are doing other stuff on the disk, it should be highly unlikely that a CMR disk wouldn’t be fast enough, but i think we did have some new highs that haven’t been reached not long ago… so i suppose it’s possible…

my money is on your disk for whatever reason cannot keep up… your memory usage is grotesque

this is a node of similar size.

then node will use more memory the slower the disk gets… pretty much, it does seem like it sometimes will use more memory for various filesystem tasks, not sure why… but i have pretty much ruled out that its only disk latency…

but in 95% of all cases excessive or increasing memory usage means the disk cannot keep up.

i’m sharing a proxmox monthly max cpu + memory graph for a 3.6TB node, having 16x threads

sure my memory usage could most likely also be much better… because my setup isn’t perfect either…

how is your storage connected? and which model is your hdd exactly

GC in combination with atime might cause high load.

1 Like

yeah totally forgot to think about that…
i also disable xattr on my zfs pool to make it run better… supposedly
atime is like 1 extra io pr read/write and xattr is like 2 io… if i remember correctly.