CPU usage high?

heartbeat · September 6, 2020, 9:23pm

Does this CPU usage look high?

This is a new machine with Lubuntu 20.04, NVMe and a 5tb drive on usb3. The CPU is a Intel Celeron J4115 with 8GB ram and only has Storj running.

It doesn’t seem to be related to a high increase of traffic

Does this look normal? I thought Storj only needs 1 core. I am looking to host another Storj node and maybe a few wallets for staking

BrightSilence · September 6, 2020, 10:22pm

It’s all IO wait, which means the CPU is not the bottleneck, but the HDD is. What model is it? Could it be an SMR drive? The high IO wait moments likely coincide with either used space calculation or garbage collection.

heartbeat · September 7, 2020, 10:18am

It doesn’t say but i suspect it is. 5 TB on a 2.5 inch. Anything I can do about this? I cant even “shuck” it and put it inside the machine

BrightSilence · September 7, 2020, 10:27am

I don’t think the USB3 is your problem, but yeah most of those larger 2.5" drives are SMR. Other SNOs have had some success by running another node on another HDD and effectively spreading the load. Other than something like that, there is likely not much you can do.

One other question, what file system are you using?

heartbeat · September 7, 2020, 10:41am

What do you mean by running another node on another HDD? I have a 2nd 2.5 inch 5TB which I was going to run as a second node with a second docker instance. I am using ext4 - is there a better one? I could test it with my new node

BrightSilence · September 7, 2020, 10:42am

That’s basically what I meant. Those nodes would split incoming data so each HDD only has to deal with half the load. Ext4 is fine, was just making sure you aren’t using ntfs on linux. People have seen issues with that as well.

heartbeat · September 7, 2020, 10:53am

Yes, I originally started with NTFS - really bad idea!

This happens often. CPU at 100% and it always starts at 00:00… It will normally last a few days. Anything I should check? Still related to the HDD do you think?

BrightSilence · September 7, 2020, 10:55am

Since it’s still all IO wait, yes, it’s HDD related. There’s not much to check, you could check the HDD model and look up whether it’s SMR, but I’m pretty certain it will be. Those HDD’s simply aren’t optimized for 24/7 write operations.

heartbeat · September 7, 2020, 10:57am

ok, thanks for your help. I will try and swap it for an internal drive. This new case can house a few internal drives.

heartbeat · September 7, 2020, 11:22am

ok, i just read another forum post. Do you think starting a new node under the same IP will help reduce the I/O load? I should be able to get it up and running today

BrightSilence · September 7, 2020, 11:31am

Yes I do think it will. 2 HDD’s will be better able to deal with the total load on your IP.

heartbeat · September 7, 2020, 11:44am

I will give it a go and and report back in a week or so. thanks!

Pac · September 7, 2020, 7:18pm

I’ve never seen 2.5" CMR disks bigger than 2TB, so I’d be pretty sure your disk is SMR. Even most 1TB disks are SMR these days, so manufacturers can cut prices down.

Personnally, I’m using the --storage2.max-concurrent-requests option to go easy on such drives. Not ideal for the network, but that’s the only way I found to make a single SMR drive cope with the load. Without this option, my disk stalled so much it got my node suspended because it was unresponsive with all its databases locked.

This option should be used in last resort I guess, so if you can go with 2 disks maybe it’ll be alright even though the new node won’t be very active at first.
But not all smr drives are created equal, so… worth a try

SGC · September 8, 2020, 6:08am

@Pac
i use max concurrent for a long time, and my main issue with that was it would create its own errors simply caused by having a max set…
but ofc a little software problems rather than wearing a hdd out fast… thats an easy choice…

@heartbeat
i would also recommend recommend the dual node or triple node approach… after x3 the iops performance gains start to diminish so much it’s not worth it… i mean adding 1 node is +100% iops performance and adding a 3rd is 2 nodes + 1 and thus 50% iops performance increase…

so to extend hdd life i would recommend anyone running storj to run 2 or 3 hdd / nodes or similar setups.

by doing almost everything i can to remove iowait my storagenode errors are basically non existing…

this mornings log successrate.

i know my uploads are a bit slacking, but i hope to get that removed when i install my new ssd which works as a write cache… the current two SSD’s i got can’t keep up… which is weird because we are uploading… but apparently that causes a ton of writes… and ofc i’m running everything in sync writes for various reasons.

afaik anyways…

========== AUDIT ==============
Critically failed:     0
Critical Fail Rate:    0.000%
Recoverable failed:    0
Recoverable Fail Rate: 0.000%
Successful:            316
Success Rate:          100.000%
========== DOWNLOAD ===========
Failed:                0
Fail Rate:             0.000%
Canceled:              4
Cancel Rate:           0.018%
Successful:            21878
Success Rate:          99.982%
========== UPLOAD =============
Rejected:              0
Acceptance Rate:       100.000%
---------- accepted -----------
Failed:                0
Fail Rate:             0.000%
Canceled:              5
Cancel Rate:           0.058%
Successful:            8675
Success Rate:          99.942%
========== REPAIR DOWNLOAD ====
Failed:                0
Fail Rate:             0.000%
Canceled:              0
Cancel Rate:           0.000%
Successful:            8829
Success Rate:          100.000%
========== REPAIR UPLOAD ======
Failed:                0
Fail Rate:             0.000%
Canceled:              0
Cancel Rate:           0.000%
Successful:            2104
Success Rate:          100.000%
========== DELETE =============
Failed:                0
Fail Rate:             0.000%
Successful:            13100
Success Rate:          100.000%

the iowait spikes is a bad harddrive that i still haven’t been able to fix… keeps turning itself off, but it’s not to bad, when it doesn’t get to much traffic…

oh yeah you should check that your APM on the HDD firmware is turned off, has helped my latency… the drive is still broken… but works better and even all my other hdd’s seemed to reduce latency / iowait in general.

twl · September 8, 2020, 6:11am

To be exactly precise, there is one Toshiba CMR model with 3TB, all other 2.5in drives above 2TB use SMR

Pac · September 8, 2020, 7:55am

Sure, but personally what I really dislike about this option is that it doesn’t feel like the best approach to me: Ideally I think it should be dynamic, and the node software should adjust this in real time to reject uploads (ingress) when the disk can’t keep up (for instance when the average write speed of pieces falls below a threshold maybe?), but should then go back to accepting more pieces in parallel when the disk is “calm” again.

Because right now, it statically caps the number of files to be written in parallel even though there are moments where SMR disks can cope with way more than that.

heartbeat · September 8, 2020, 1:58pm

Thanks everyone for your feedback.

I have been trying all night to get my second node up. I am hosting it on a second machine with this command:

sudo docker run -d --restart unless-stopped --stop-timeout 300
-p 28967:28967
-p :14002:14002
-e WALLET=“XXXXXX”
-e EMAIL="XXXXX@gmail.com"
-e ADDRESS=“XXXX.ddns.net:28968”
-e STORAGE=“4TB”
–mount type=bind,source=/home/XXX/storj/Identity/storagenode,destination=/app/identity
–mount type=bind,source=/mnt/storj2,destination=/app/config
–name storagenode2 storjlabs/storagenode:latest

and I currently have my port forwarding setup like this (i have just many different combos)

I have been reading this guide (Setting up second machine with storj on same network - #4 by BrightSilence) and i am trying to use:

node2: outside world => 28968 => router => 28967 => node machine2 [ => 28967 => docker container]

I think I have set it up right but Storj is showing offline? Happy to try another way if there is a better way. I might even stick it on the same box if you think that is better / easier

heartbeat · September 8, 2020, 2:00pm

Port tester here https://www.yougetsignal.com/ is saying the port 28968 for my IP is opened so I think the port forwarding is setup ok?

BrightSilence · September 8, 2020, 2:04pm

This setup only works if you’re running the second node in a different machine. Hence why it says node machine2.

If you’re running them on the same machine you will need to do this.
node2: outside world => 28968 => router => 28968 => node machine [ => 28967 => docker container]

Meaning the docker run command will have -p 28968:28967 and the router forwards public port 28968 to 28968 on the node machine.

heartbeat · September 8, 2020, 3:36pm

I am running the second Storj node on a second box. So I have specified:

-p 28967:28967

and set my router to send incoming data from 28968 to 28967