ZFS SLOG weird netdata graphs

SGC · October 4, 2020, 7:36am

TL;DR
Questions…
Why does this happen…?
How do i solve it?

So i was seeing some latency issues from my SATA SSD’s which i was using for my SLOG device, and then bought an enterprise ssd, which seems to be working great…

i use netdata quite a bit on my server because it gives me access to a ton of information that may be difficult to access otherwise, even tho i kinda hate netdata for being a crappy program… but that doesn’t mean it’s not a useful tool… so i like to see it working correctly.

the issues with my SLOG creating latency is all gone… however me being not exceedingly well versed in linux yet only been using proxmox / debian for some 7 months now… i know my way around and ain’t to afraid of doing deep dives, to try to understand why something isn’t working…
but it’s not always easy to figure out where to begin.

ofc it doesn’t always happen it seems like it’s working correctly tho maybe faster than what netdata can display or something and then when it gets really up to speed the latency is low enough and the speed is high enough that some of the netdata math goes wrong… just kinda guessing…

but like this… is a few minutes later and it correctly shows 500kb /s writes which seems perfectly reasonable numbers…

the device is a “Lenovo” io3 enterprise value
which is really a SX300 card, to make it work with debian i installed so opensource drivers that are community made (update of the original driver package) seems to work without any other issues tho.
but ofc it is possible that the issue is related to this, but i like to think it’s somehow related to netdata’s crappy programming… because it’s usually netdata that gives me grief rather than the ssd drivers which worked flawlessly afaik.

the device is of the fusion ioMemory series and a 1.6TB PCIe 2.0 x8 model because that was the max my mobo supported and didn’t want any incompatibility issues, or potential poor performance due to poor comparability.
works great i’ve tested it to a few GB /s and even up to 4 GB/s but when it’s from caching that’s not really that impressive, i believe hdd’s get close to that range also.

ofc one will run into PCIe bus limitations at one point… alas it all works.

i’m using ioMemory VSL drivers
fio-status output

fio-status -a

Found 1 VSL driver package:
   4.3.7 build 1205 Driver: loaded

Found 1 ioMemory device in this system

Adapter: ioMono  (driver 4.3.7)
        1600GB Enterprise Value io3 Flash Adapter, Product Number:00D8431, SN:11S00D8431Y050EB58T005
        ioMemory Adapter Controller, PN:00AE988
        Product UUID:8f616656-45e4-5109-a790-6f766c059382
        PCIe Bus voltage: avg 12.15V
        PCIe Bus current: avg 0.68A
        PCIe Bus power: avg 8.21W
        PCIe Power limit threshold: 24.75W
        PCIe slot available power: 25.00W
        PCIe negotiated link: 8 lanes at 5.0 Gt/sec each, 4000.00 MBytes/sec total
        Connected ioMemory modules:
          fct0: 07:00.0,        Product Number:00D8431, SN:11S00D8431Y050EB58T005

fct0    Attached
        ioMemory Adapter Controller, Product Number:00D8431, SN:1504G0637
        ioMemory Adapter Controller, PN:00AE988
        Microcode Versions: App:0.0.15.0
        Powerloss protection: protected
        PCI:07:00.0, Slot Number:53
        Vendor:1aed, Device:3002, Sub vendor:1014, Sub device:4d3
        Firmware v8.9.8, rev 20161119 Public
        1006.00 GBytes device size
        Format: v501, 1964843750 sectors of 512 bytes
        PCIe slot available power: 25.00W
        PCIe negotiated link: 8 lanes at 5.0 Gt/sec each, 4000.00 MBytes/sec total
        Internal temperature: 43.31 degC, max 47.74 degC
        Internal voltage: avg 1.01V, max 1.01V
        Aux voltage: avg 1.80V, max 1.81V
        Reserve space status: Healthy; Reserves: 100.00%, warn at 10.00%
        Active media: 100.00%
        Rated PBW: 5.50 PB, 99.97% remaining
        Lifetime data volumes:
           Physical bytes written: 1,455,959,980,072
           Physical bytes read   : 941,636,499,552
        RAM usage:
           Current: 786,365,440 bytes
           Peak   : 803,504,640 bytes
        Contained Virtual Partitions:
          fioa: ID:0, UUID:94d66bf0-2410-43fe-a33b-ef602e135305

fioa    State: Online, Type: block device, Device: /dev/fioa
        ID:0, UUID:94d66bf0-2410-43fe-a33b-ef602e135305
        1006.00 GBytes device size
        Format: 1964843750 sectors of 512 bytes
        Sectors In Use: 853700546
        Max Physical Sectors Allowed: 1964843750
        Min Physical Sectors Reserved: 1964843750

the drive is formatted to about 60% utilization for performance reasons, not sure if i was suppose to leave it unformatted or format it to 100% and then keep the partitions at max of 60% utilization of the total, wasn’t able to find anything on that so i assume it doesn’t matter.

i used fdisk tp make the partitions on the virtual drive the SSD OS / Firmware creates to interact with the Host Bus.

i reformated the drive to run 512byte sector size because the rest of my main pool is running 512B, wasn’t able to run it at 4k without it throwing errors because the rest of the pool was 512…

one idea i got is that maybe the virtual partitions the SSD firmware creates aren’t suppose to be partitioned into smaller partitions, haven’t really tested that theory out, kinda doubt it will lead anywhere…

if anyone got any ideas about whats going on with netdata, i would be very interested in hearing it…
currently the device is connected as slog and l2arc… i can disconnect it without any issue, but i am planning to start running my OS from it, and thus the complete removal or formatting of the card becomes a bit more problematic…

i know i should have a dual boot option… but i currently don’t… and since i don’t have or plan on making it a mirrored solution, then i will need to have some sort of secondary backup boot to a backup copy of the OS in case of failure.

so that will wait for another day… so if anyone is up for the challenge, i’m all ears…
i have been told that it might be some sort of disconnect between the drivers and the OS… but not sure if i buy that, they seem to work just fine…

also if you happen to have a ridiculously fast ssd, then how does that act in netdata… can netdata figure out how to show its super low latency… could it be the parallel access to my ssd that makes it go all crazy… plenty of questions not many answers lol

so let me sum up again

Odmin · October 4, 2020, 8:34am

I have one simple proposition for you, just attach any SATA SSD drive to your system and move SLOG, L2ARC to this TEMPORARY drive. Then look into netdata graphs and compare with graphs on fusion io drive… I think you will be wondering

SGC · October 4, 2020, 8:58am

SATA SSD’s give me normal graphs, just like hdd’s just higher speeds and lower latency.

operations doesn’t show tho…

so yeah i duno what you mean…

this is with slog on sata and on pcie ssd at the same time…
the spike on the pcie ssd doesn’t correlate to anything really… well it does match very perfectly with the second highest spike… on the sata ssd activity…

so it does seem to register something… it just shows it wrong.

i also tried updating my netdata a few times and even reinstalling it, rebooting the server… see if there was any easy to find information online… which i didn’t have much luck with… very specific problem with a very limited users base… so yeah…

Odmin · October 4, 2020, 9:14am

Excellent! Now I will explain, why you see this “issue”:
The Fusion IO drive does not usually drive, and netdata can’t show you the real performance status of this drive (only status between hardware and software). The key difference is fusion IO memory driver, all “magic” did on driver (aggregate writes, and they flush to the real device, it also consume memory) So, you can’t see the details like on SSD drive, on Fusion-IO drive (driver like small RAM disk for caching and aggregating then flush this data to a real device and it exactly what you see).
I strongly not recommend using Fusion for SLOG, it will be a big issue if power is loose, because the part of write information is storing on the host memory, and it is a problem.

SGC · October 4, 2020, 9:23am

it has PLP Power Loss Protection and is actually very well suited it having internal ECC and a Hybrid type RAID5 with selfhealing upon problems, it’s write speeds and iops is also quite good even tho read is ofc better…

yeah i kinda figured there was a chance it couldn’t be solved…
this is a very different device when compared to regular ssd’s, tho i’m sure many modern ssd’s use some of the things that was invented for this type ssd.

i don’t get why netdata cannot just write how fast its going… i mean it’s not like there isn’t being data sent in the mean time… and zfs can read the latency just fine… so the data is there… netdata is just stupid…

really its not because i need a solution, but would be nice… and if i move my OS to the pcie ssd then it may be difficult to reform or reconfigure the card, so wanted to see if there was a solution before i ended up locking the system down to a certain configuration for a while until i solve the dual boot backup OS setup deal…

on another note my SLOG usage is only partly for add data integrity, the main usage is to improve hdd performance in my pool, as i run with sync always, thus limiting database data fragmentation, making random writes into more sequential writes and thus improved write and read read latency on the pool.

really the slog is only useful on data loss, and tho i have often turned off my server without turning it off, like 70-100 times over the last 7 months most often when the storagenode was running… i haven’t had any data lost, even without running a device without PLP as my slog device…

keep in mind with Copy on Write then a SLOG isn’t really required… because of the way it works, data corruption is highly unlikely… so then we get to the data lost before it hits the drive… and then we are talking miliseconds… which is a few kb of data even at reasonable speeds… sure it could be important… but also remember that synced writes will usually take priority before other writes… and thus will get the lowest possible latency, so important db changes will be written even faster than a few milisec… ofc unless if the device is overloaded, but lets assume everything is running within spec atleast…

so we have a few kb that didn’t get written… but in that case because it’s sync writes we won’t have reported back to the sender that it had been written… and thus the integrity of the data chain is still good… sure we might have dropped some async writes also in the kb we lost…

but how important is that… if the system knows it crashed, and when it boots it resumes from where it left off… only thing thats required to avoid this is that when node goes offline then the last few async write is disregarded as being saved… on the satellites part…

and if it was really important then it should have been sent as sync in which case it would be reported that it was written… which could be a check which verified the data chain of whatever have been sent for the last little while is all stored and verified…

maybe in some extreme cases… with something very poorly programmed, which requires that a byte isn’t in the wrong place EVER… then sure… it might not be a good idea…

and now with my new slog i’m down to… 19 microseconds i think turned out to be 15 microseconds
for baseline latency… before something is most often saved… unless if its like more data than most likely can go through my internet connection… or high iops

so yeah… i don’t really see a bit problem…and i certainly haven’t had a problem even tho i’ve been so mean to my node and server… did kill netdata one time where i did run for a week with it doing random reboots a few times a day

so yeah sure some crappy programs can be affected… tho i cannot really be sure that was why… and i have yet to have a checksum fail that wasn’t because i pulled a drive or something crazy