High load after upgrade to v. 1.3.3

yeah checked that there seem to be a lack of understanding of the problem, but we will get there eventually…
also some things or suggestions can be difficult to implement so we have to figure out a solution that is easy makes sense and can be proven to work before its implemented…

not matter how well and idea is liked, it has to be practical else it will remain an idea… atleast for a long long time…

up to 60 and 70% utilization on both my ssd write caches now and a latency of 60ms on one and like 160ms on the other… not sure if i can get more performance out of them… but they cannot keep up, even tho they are two seperate devices grabbing writes…not sure how to make it faster… i tried each of them seperate on the pool, then i tried mirrored because i couldn’t figure out how to strip them, but i sort of knew that wouldn’t work for a write cache, but didn’t know what else to do… but that only lasted 5minutes because i could see immediately that it basically quadrupled my bandwidth usage to the ssd’s or tripled because they needs to mirror each other and then they need to receive and transmit data… so yeah just no…

now they seem to be load balancing quite nicely, tho one is clearly slower than the other but their over all performance seems to be significantly better than each on their own… by a lot… like lots and lots… xD still not enough… and i doubt they would be better in a stripe… thats usually just worse for IO if memory serves because each write goes to both drives, here zfs would get a choice of which ssd it want to try and write to… and i bet it would prefer the faster one… but seems evenly distributed sadly enough…

might have to get that samsumg evo pro 970 . 1TB NVME drive… xD then i would be past all that write latency bs…won’t change my read speeds but it will change my uploads and maybe how well my vm’s run, but turned them off while testing… so they don’t interfere with the results.
also the system just booted up only 26 hours ago, so its not really awake yet… the l2arc isn’t filled yet… i should figure out how to control the feedrate, but ofc that would just make it’s eventual boot time even slower before the l2arc got filled… but i might do that… when i figure out how…
so maybe 24-48 hours more and it should start running as intended. then the ssd latency might drop quite a bit… also the l2 arc device that seems slugish and it ofc also has a SLOG so yeah not really helping it there…

might try to disable the l2arc… not really that useful for random reads and writes either…but it does help a ton with running vm’s, also i ponder if the first log device gets more pressure than the second thats added… might not matter and it might… kinda doubt it matters tho…

this is kind of interesting, the more i press the machine to the limits of what the hardware can handle, the future down my download % also goes… even if the overall download bandwidth is going up in my networking graphs ofc i will now have to take into account that it could be up to 10% less

but apparently now i’m getting to the limits of what my drives can read… also when i checked my system time morning i had a hdd acting up, not quite sure whats going on with that… it seems to wander between drives and creating increased latency… now i get up to at times 4 sec backlogs on one drive… which is like 10x to 40x what the nor is for the others… and everything was fine before i went up to running the node on unlimited… not saying that is it… most likely it was just the catalyst for showing me a underlying issues with my system…
also like you saw the graph had dropped off… during the night
alas after a reset the node jumped back to life at full speed… which is kinda odd… but i’ve seen it happen before, but it’s been a long time since… cannot help by think its due to overload from going unlimited.

========== AUDIT ==============
Critically failed:     0
Critical Fail Rate:    0.000%
Recoverable failed:    1
Recoverable Fail Rate: 0.231%
Successful:            432
Success Rate:          99.769%
========== DOWNLOAD ===========
Failed:                132
Fail Rate:             1.447%
Canceled:              678
Cancel Rate:           7.431%
Successful:            8314
Success Rate:          91.122%
========== UPLOAD =============
Rejected:              0
Acceptance Rate:       100.000%
---------- accepted -----------
Failed:                0
Fail Rate:             0.000%
Canceled:              6835
Cancel Rate:           16.594%
Successful:            34355
Success Rate:          83.406%
========== REPAIR DOWNLOAD ====
Failed:                0
Fail Rate:             0.000%
Canceled:              0
Cancel Rate:           0.000%
Successful:            32
Success Rate:          100.000%
========== REPAIR UPLOAD ======
Failed:                0
Fail Rate:             0.000%
Canceled:              124
Cancel Rate:           16.188%
Successful:            642
Success Rate:          83.812%
========== DELETE =============
Failed:                0
Fail Rate:             0.000%
Successful:            656
Success Rate:          100.000%

after a node restart

but yeah might be related to delays caused by one of my drives acting up during the night.
kinda odd tho, would be nice to know what happens there…
when i got back the machine bad hdd was at 4sec delay shutdown the node and its all normal…
so i guess yet another bad drive… lol i should start a repair shop

this is the 3rd or 4th one in 2 months, had it running for years without doing much… and now they are dropping like flies.