I am running a LSI SAS 9200-8e
Ah, brand new MB and cheap controlerâŠ
Mainboard has PCIE4.0 16lanes
Controler has PCIE2.0 8 lanes. wich
would equal the speed of 2 PCIE4.0 lanes. in fact there is 75% iowait on 8 lanes in this case.
The mainboard SATA has a 4 PCIE4.0 4lanes connection.
Are they SAS or SATA?
If SATA please connect to mainboard.
If SAS you are out of luck, new controler maybe. or live with iowait, maybe assinging 4 cores to the four nodes in docker (or to docker if storj only) makes it better.
Itâs likely a bad hdd, i had this problem in the past, it was caused by a dying hard drive.
Wouldnât the corresponding node failing audits? aside from errors and restarts in the log?
Not necessarily. Audits are random, and perhaps there are FATAL errors due to readable or writeable checks timeouts, which prevents the nodeâs disqualification before itâs too late.
But yes, the chocking HDD may cause high iowait or even stalling the whole system.
I donât see a problem with the controller beeing pcie 2.0, if I understand the numbers corectly. The SAS disk is 6Gbps, the pcie 2.0 offers 5Gbps per lane. The pcie and SAS controller offer both 8 lanes. So the pcie can accomodate 8 disks at 5Gbps, and the SAS controller can accomodate 8 disks at 6Gbps. Beeing 1Gbps lower per lane/disk in case of the pcie port, itâs not a big deal. The 6Gbps speed is theoreticaly possible, anyways. The disks transmit at much lower speeds.
Any controller will use just one lane of the pcie x version for one disk.
Of course the problem is not the throughput in the lanes.
But the cpu could do it 3times faster, so i guess it has to wait.
Maybe?
Would be interesting if iowait is real cpu usage or just displayed waiting time.
IO Wait seems to be down again. I did not change anything so i donât know what the issue was. I got some older consumer grade HDDs mixed in there so maybe @Dali44 could be right about a dying HDD. I will monitor the issue.
I also donât think that the HBA Bandwidth has anything to do with it. The HDD Bandwidth usage is nowhere near the 6 Gbps limit.
Usualy, its a fragmented full/big node, or maybe an smr drive slipped in.