Storagenode container using 100% cpu

SRS · April 10, 2020, 4:09am

Hi. I’am using Unraid OS and the storagenode docker is using 100% of the CPU. Need help fast. My whole filserver is slow.
Here is a pic from (top) command:

Pac · April 10, 2020, 7:22am

Looks to me like the system is mainly waiting for i/o operations (90+% wa). Not the CPU.

Cross91 · April 10, 2020, 7:44am

Hi SRS,

what does docker stats say ?

Mine with Unraid 6.8.3 and a I7 2600K and 32GB RAM.

Cross91 · April 10, 2020, 8:07am

Also netdata is a very good tool for monitoring your system. Give it a try.

Eioz · April 10, 2020, 8:13am

Hello, i confirm that i have also much CPU used since few days for information. Thanks to Storj community ! chart2

Cross91 · April 10, 2020, 8:21am

Maybe due to the increased traffic

SRS · April 10, 2020, 3:31pm

When tha (wa) is high the server almost stops completly up and it cant handle all the data coming in.

hoarder · April 10, 2020, 4:11pm

I have a lot of iowait on one of the nodes. Another node host with same drive but beefier hardware performs better.
I guess using old SMR drives might not be the best idea if this kind of workload continues.

LinuxNet · April 10, 2020, 4:32pm

same here oO

but only on my RaspberryPi4 with Debian & Docker. Everything ok on all other nodes with CentOS & Docker.

EDIT 12 hours later:

The process has calmed down.

SRS · April 15, 2020, 10:43pm

I still have the same problem. Have changed the hardware now to HP ml350 gen5 server

SRS · April 16, 2020, 4:37am

I don’t know if this have anything to do with my problem but i can see in unraid’s processes that some commands are constantly using cpu% :

26.1% cpu 0.6 531716 125304 ? Ssl 01:32 70:36 ./storagenode run --config-dir config --identity-dir identity --metrics.app-suffix=-alpha --metrics.interval=30m --contact.external-address=…

13.2% cpu 2.1 2270808 445044 ? Ssl 01:07 39:11 /usr/local/sbin/shfs /mnt/user -disks 31 20480000000 -o noatime,allow_other -o remember=330

12.3%cpu 0.0 0 0 ? S 01:07 36:33 [unraidd0]

Also when the (wa) goes up, there is some commands that have the D (waiting for disk) :

D Apr15 0:49 [kswapd0]
D 06:15 0:00 [kworker/2:2+md]
D 01:07 0:59 [unraidd3]
D 01:07 0:01 [xfsaild/md3]
D 01:07 0:10 [xfsaild/md4]
D 06:00 0:00 [kworker/u16:5+flush-9:3]

I have smart tested all the hdd and all was ok.
I have parity sync once a month and the writing read speed is around 75 MB\s so i don’t think it’s something wrong with the disk’s.
I have tested all the disk’s by copy large files to each of them and the network speed is 950Mbps without any problem.
Also then i stop the storagenode docker container the server goes back to normal.
Is it possible that there is any problem with the databases?
This problem started after the last update. Some of my databases got malformed after the update.

SRS · April 16, 2020, 8:26pm

any suggestions what to do?

deathlessdd · April 16, 2020, 8:30pm

Hit it with a hammer? But on a serious note I read on a few places unraid has issues with running storj for some reason as well. I don’t personally run on it but I have used unraid for some projects but mostly for virtualizing gpus. I think the problem with unraid it really needs a high end server to run flawlessly.

SRS · April 16, 2020, 8:43pm

But it’s so wierd that this happend right after a storjnode version update and database malformed. I really doubt this is a unraid problem.

deathlessdd · April 16, 2020, 8:44pm

That is the issue the way it gets mounted for unraid causes the malformed issue. Which version of unraid are you running currently not sure if this has gotten fixed yet though.

SRS · April 16, 2020, 8:47pm

My node is running v1.1.1

deathlessdd · April 16, 2020, 8:48pm

I mean version of unraid

SRS · April 16, 2020, 8:48pm

Ah ofcorse… 6.8.3

deathlessdd · April 16, 2020, 8:53pm

Oh ok you are running the newest version though, Unfortunately I can’t test it, But it could be a simlar issue that I have had with my node I seen very high IO load on mine and its either related to the size of the node, and possibly running garbage collection it lasted days till my node was full.
I found this issue maybe its simlar to yours https://forums.unraid.net/bug-reports/stable-releases/672-docker-image-huge-amount-of-unnecessary-writes-on-cache-r733/

SRS · April 16, 2020, 9:35pm

That link you posted, that is about alot of writes to cache disk. I have a 240GB SSD (btrfs) and the docker.img is there. It’s 53.7GB and there is minimum activity on that cache disk.

I do actually remember now that i upgraded unraid sinse my node went down cause of malformed. I had 6.8.2 before. Mabye i should try to restore to 6.8.2? Hmmm