High Iowait (mostly) on startup

alpha4701 · January 4, 2024, 1:46pm

I am running a LSI SAS 9200-8e

daki82 · January 4, 2024, 3:19pm

Ah, brand new MB and cheap controler…

Mainboard has PCIE4.0 16lanes
Controler has PCIE2.0 8 lanes. wich

would equal the speed of 2 PCIE4.0 lanes. in fact there is 75% iowait on 8 lanes in this case.
The mainboard SATA has a 4 PCIE4.0 4lanes connection.

Are they SAS or SATA?

If SATA please connect to mainboard.

If SAS you are out of luck, new controler maybe. or live with iowait, maybe assinging 4 cores to the four nodes in docker (or to docker if storj only) makes it better.

Dali44 · January 4, 2024, 6:34pm

It’s likely a bad hdd, i had this problem in the past, it was caused by a dying hard drive.

daki82 · January 4, 2024, 8:55pm

Wouldn’t the corresponding node failing audits? aside from errors and restarts in the log?

Alexey · January 5, 2024, 3:37am

Not necessarily. Audits are random, and perhaps there are FATAL errors due to readable or writeable checks timeouts, which prevents the node’s disqualification before it’s too late.
But yes, the chocking HDD may cause high iowait or even stalling the whole system.

snorkel · January 5, 2024, 7:11am

I don’t see a problem with the controller beeing pcie 2.0, if I understand the numbers corectly. The SAS disk is 6Gbps, the pcie 2.0 offers 5Gbps per lane. The pcie and SAS controller offer both 8 lanes. So the pcie can accomodate 8 disks at 5Gbps, and the SAS controller can accomodate 8 disks at 6Gbps. Beeing 1Gbps lower per lane/disk in case of the pcie port, it’s not a big deal. The 6Gbps speed is theoreticaly possible, anyways. The disks transmit at much lower speeds.
Any controller will use just one lane of the pcie x version for one disk.

daki82 · January 5, 2024, 9:00am

Of course the problem is not the throughput in the lanes.

But the cpu could do it 3times faster, so i guess it has to wait.

Maybe?

Would be interesting if iowait is real cpu usage or just displayed waiting time.

alpha4701 · January 8, 2024, 5:33pm

IO Wait seems to be down again. I did not change anything so i don’t know what the issue was. I got some older consumer grade HDDs mixed in there so maybe @Dali44 could be right about a dying HDD. I will monitor the issue.

I also don’t think that the HBA Bandwidth has anything to do with it. The HDD Bandwidth usage is nowhere near the 6 Gbps limit.

daki82 · January 8, 2024, 8:16pm

Usualy, its a fragmented full/big node, or maybe an smr drive slipped in.