Egress seems to stress my setup

As i’m sure many of us have noticed then Egress have been going up lately…

but so has my IOwait… i do have a disk thats acting up a bit. but nothing of note, will run without out it so should be fine… however it is causing a fair amount of IOwait…

so i’m pretty sure this is due to the increase in egress, however i was wondering if anyone else was seeing a rise in their IOwait’s / hdd backlog / hdd latency / seek time … whatever we want to call it

i know that it will vanish if i replaced the drive… which i might because it gives me bad nerves looking at my IOwait exceeding my cpu utilization %, but i asking to try to determine if anyone else is having trouble with this… my setup is vastly overpowered compared to 95-98% of all storagenode… so if i’m seeing this others might have issues…

so it would be nice to determine if this is a network wide thing with the current workload, or just my 1 failing redundant drive…

might also just be the heat and a bad connection somewhere… seen that a few times before… but might be a few days until i got cables to test that out… and im pretty sure it’s the disk… it’s been weird for a few months and doesn’t matter which bay its in…

no, all quiet and normal my side. where are you located geographically?

denmark, northen europe

maybe it’s the heat making a bad connection somewhere worse…
or the disk is simply dying hard… even tho it’s still working and temp looks fine…
manage to get 30k load cycles in 1-2 days… so now i’ve tried to disable APM using hdparm.

have seen the iowait thing a few times with bad connections tho… so i wanted to pull the drive without rebooting… ofc my notes of the serial numbers was wrong and i pulled the wrong one lol

so noted the right serial down and its location and put the drive back… so now i’m running a scrub to ensure my pool data integrity.

ofc the scrub also seem to run slow…

i’m seeing about 2 MByte /s egress and 500kbyte /s ingress on avg atm

well that is not something that should saturate a normal HDD. Any errors on the HDD? sometimes bad cables cause things like that, and periodic spin down spin up are not good either

no recent or relevant errors… but it’s been acting up for a while in all kinds of odd ways… but its in a raidz1 of 3 drives… and with a bit of shuffling around i can free up a disk that can replace it…

got two extra 6tb paired with a 3tb in an adjacent raidz1

got 3 of those raidz1 with 3 drives making up a pool of a total of 9
was kinda planning on adding the last 6tb to get 2 of the raidz1’s up to running full 6tb drives and thus gain a bit of extra space… but i also have been pondering wanting to buy 14tb next time i buy drives… so buying a 6 tb kinda goes against that… so if one of them is dying… then that kinda answers the question of what type of drives ill buy next…

sadly that means i will be looking at buying 4x14 TB drives… which my wallet isn’t to happy with that prospect lol the pain of working with zfs.

hmmm just checked … my load cycle are up to 400k from 330k a day or two ago… it’s basically just crashing and burning…

going to shut down and make an attempt at saving it… at this pace it won’t last the month out

I got egress between 5-9Mbps and my setup doesn’t care at all. Querying the node API actually takes more IO and CPU time than all the egress :smiley:
And I’m running 3 nodes on the same raidz1.
grafik

The only issues I have with IO/wait on my side are due to SMR drives. But in my case, it’s usually a lot of ingress that causes issues, not egress though.

did a full shutdown and took an hour of dt to tinker and clean the connectors… also manage to get my fan’s running slower… so now i’m down to 250watt draw so thats a win…

but now i moved it to a new controller, using new cables, and hopefully that will sort things out
so i can get through my scrub and look at replacing the drive… maybe it will be fine when the system settles now…

i’m at about 10-15mbit egress atm… sometimes closing on 20mbit…

well disk is still acting up… going to get rid of it asap… but will take a few days to get to that point most likely… i really need to stop using my spares for personal storage lol

1 Like