Zfs discussions

SGC · December 20, 2020, 6:47pm

i didn’t run hardware raid i just mixed sas and sata in the same array… tho they may also have been or was very likely on the same ssf8087 port with sata and on the same backplane as sas and sata…
so i’m not 100% on what actually fixed the issue, but the guy that explained it, basically said one shouldn’t mix them on arrays in general… may or may not apply to zfs pools… now that i think about it… i’m not even sure i actually had them mixed in pools, because i wanted the drives to all be the same so they didnt have offsets in mechanical or software to cause them to be out of sync and thus created extra latency and wear.
was a painful trouble shoot tho…

it’s not that it doesn’t work… its that most basic stuff ends up being so complicated, but that might also be down to me working on windows for a couple of decades… all the possible settings and stuff you can do in linux sort of makes windows look like iOS when compared to linux…
so much stuff in linux you mostly never have to deal with in windows, because it all runs on auto…
but thats what happens when something is used by that many people and so much money is pushed into the development.
windows as a hypervisor tho… thats a joke, i spent weeks trying to passthrough a usb port…

maybe i misunderstood how that spanning tree thing works then… i understood it as it made a sort of table over the MAC addresses to avoid sending data back and forth over the same cable multiple times…

pretty sure my vlan setup is bouncing data back and forth atleast two times between the switch and the vmbr in the server, thus the 1gbit network connection ends up being able to do 160-170mbit
which is really annoying…
sounded like it was to fix that… STP but haven’t gotten it working yet, so maybe not… vlan’s gets really complicated… i should just a pulled another cable lol

uhhhh was checking up on L2ARC it seems ZFS 2.0 has been released…
i’m so going to hack upgrade my proxmox zfs version
PERSISTENT L2ARC

Pentium100 · December 20, 2020, 8:08pm

Hyper-V works fine and newer versions of Windows can be used to build a cluster, probably more easily than the same with Linux.

The problem with “auto” is when it breaks, it creates bigger problems.

That’s just a MAC table on each switch.
STP works by building a “tree” and disabling redundant connections. For example, if you connect three unmanaged switched in a circle, you get a loop and the network pretty much stops working. STP would detect such situation and disable one port, so that the network works properly. However, if one switch goes down (or one cable breaks) then STP enables the previously-disabled port.
Basically, STP is needed if you want redundancy and want to connect switches in loops.

Unless the VMs are in different vlans (in which case the traffic has to go trough the router), it should not happen. It does not happen for me with Linux bridges, considering I get ~16gbps between two VMs on the same host and vlan, while the host is only connected by 2x1G bond to the switch.

SGC · December 20, 2020, 8:37pm

i tried using it and it was pretty okay, but i was unable to passthrough my usb for my 3DP to a vm tried for a long time and reading a lot of obscure things about it…
over a period of nearly months but at the very least intensive for a couple of weeks.

switched to proxmox and it was 3 clicks… so yeah im sure it works fine for some things also this was a year ago now … if not more so i’m sure much have changed in hyper-v
i cannot imagine that usb thing would continue to be that difficult…
but there where many other reasons i was starting to hate windows…

true

i’m aware of mac tables, i mean the STP MAC tables for routing between managed vlan switches, without passing over the same cable multiple times… (i duno how those work and i shouldn’t really need them… but i can see they can be configured and the switch manual talks about it)

the loop protection actually sounds more like the trunks… from what i can understand the STP is for avoiding vlan’s sending data over the same cable multiple times to get to another vlan, or atleast thats what the sources i’ve been reading about it tell me…

and my switch atleast calls trunks for grouped ports for improved bandwidth and redundancy, but i really don’t know made my first vlan like a week ago… so i’m just trying to deal with the problems as i run into them.
so pretty clueless…

my problem right now as i understand it, is that i’m now in the wrong thread again

my vmbr is vlan aware so it will pass vlan tagged packets, but it will let the packets go to the physical switch to be switched / routed or whatever vlan magic it does… thus my 6 vm’s are constantly sending data to the switch to get vlan routing, when the vlan managed switch should be able to tell my vmbr in the server that they are infact in the same place… and thus make it perform the vlan switching internally in the server’s vmbr…

or thats atleast how i understood the switch manual dealing with it, another description of the principles of STP / guide and seems to also be the same in the proxmox stp guide i’m currently reading…

but maybe i just misunderstood it somewhere, but i don’t think so… but i’m sure ill find out when i cannot get it to work lol… can’t shutdown the network now anyways so will have to wait until tomorrow with implementing it…

95% of the network also works fine… its just the last little bit… but the configuration is also still kinda rough around the edges… maybe some simpler way to solve the issue.

Pentium100 · December 20, 2020, 8:46pm

wrong.

Each switch has its own MAC table and if the traffic is within one VLAN, it should not go trough the same cable multiple times.

trunk is when you put multiple vlans in one port. It does not do anything about loops.
STP can protect from loops and some switches have separate loop protection that does not need STP.

OK, yeah, some switches also call these “trunks”, Linux calls them “bonds” and Cisco switches calls them port-channels or etherchannels.

Unles you have a layer3 (routing) switch, a switch does not pass traffic between vlans - that’s the job of a router. And yes, if your VMs are on separate vlans, then all traffic between them has to go trough a router. No other way around it, that’s pretty much the reason for vlans - to separate the virtual networks without needing separate swriches.
Also, no, Linux bridge does not work like a router. However, you can make the host to work like a router.

SGC · December 20, 2020, 9:15pm

i think so… might have to look that up tho
can’t find anything in the manual about it… but it sure has enough features … i think i was looking at the stuff about the l3 switches and ended taking one of those, because it said l2 was limited for advanced vlan setups… and it wasn’t really that expensive compared to the other solutions i was looking at because i wanted 10-40Gbit uplink… but ended up settling for a 50$ one with just 8 ports just because it would fix the problem i had and i figured i could skip pulling some 50 + meters more cable

weird that the documentation doesn’t say it’s a l3 switch… but it sure got enough router like settings.
loved that i could use it to check that my cables where good… that was amazing, sadly i don’t have any other cable testing gear… did say there was crosstalk on the ethernet over power adapters port no surprise there lol

might not be l3 but it’s close enough for what i need, sounds like it, read up on l3 switches a bit more and this has to be pretty close if it isn’t a l3

and it should be able to tell the vmbr to switch vlan packets internally… else like you indicated it’s software… so i’m sure it’s just a matter of me installing something else, if it won’t do what i want it to.

Pentium100 · December 20, 2020, 9:27pm

Are your VMs on different vlans?
If so why?

Different vlans are meant to keep things separate, not it looks like you want to combine them, but in that case, why use different vlans?

SGC · December 20, 2020, 11:19pm

using pfsense, so mainly it was just to isolate the gateway to the isp wan, and didn’t want to pull extra cables since if i was to pull new cables for the server room i would do fiber, to get a cheaper, better range, faster and electrical insulated connection.

so this was just to do a semi easy patch… and then currently i’m running 2x fiber internet with each their own dhcp + the dhcp from pfsense for the lan, so to keep it all fairly isolated, but still have the server have access to it all…

and then i also kinda want to get ipmi working again, but that’s really old and fairly insecure, so wanted a bit of a management type network also.

the old fiber internet will expire by the end of the month… so was kinda nice i could easily isolate and still use that, even tho it all shares the same switch… ofc there are many other ways to solve these things… but found vlan and it sounded very useful… i’m sure i will eventually simplify the setup…

Pentium100 · December 21, 2020, 12:50pm

You have to consider separate vlans to be separate networks.

If your server is on vlan50 and the internet is on vlan60, a router has to be present on both vlans and the traffic will go trough it, even it it means that the packets go server -> sw1 -> sw2 -> router -> sw2-> sw1 -> uplink, just as if they would go if what you have as separate vlans were separate switches.

So, if you have two servers that have to pass a lot of traffic between them, they have to be on the same vlan or the traffic will go trough your router.

SGC · December 21, 2020, 3:23pm

the server port is split into all the vlans on vlan bridges on the vlan vmbr that is connected to the vlan switch… thus it’s basically two or more vlan switches.

i just need to make the server run it internally instead of asking the physical vlan switch about stuff…
today i got the existing network isolated so it’s separate from the mess i got going…
so now i can tear it all apart without disturbing anyone aside from myself and the storagenodes…
makes for much easier troubleshooting when one can actually test stuff lol…

whatever i’ve done sure haven’t worked so far… so maybe you are right about that loop thing… even tho people and documentation seems to not always be clear on that… lots of other things to try i guess…

haven’t looked at static unicast / multicast address table’s yet
and i got some pfsense configuration that might also be to blame for the issue… but atleast now i can tinker…

the problem is just my ignorance of this vlan stuff at the moment and that it’s apparently a bit more complex than first assumed and it’s not like it doesn’t work… i just only get 170mbit where i should get near 1gbit, so the signal is bouncing back and forth a few times… thats just a configuration issue.
annoying… tho

think ill give the pfsense part a go… kinda worried that i end up changing so much in my switch that i have to reset it to defaults and tho i can most likely set it up in 1/20th of the time or less, then it’s not worth the risk if pfsense is the problem. / server / vmbr’s / vlan’s
think i got 6-7 vmbr’s at the moment… but only using like 5 of them

lshw -c network looks like my posts… keep scrolling

Pentium100 · December 21, 2020, 3:30pm

With vlans, imagine completely separate switches. Say, you have four vlans - imagine having four separate switches that you can connect to. No data flows between the switches, unless it goes trough a server (that is connected to more than one switch).

So, let’s say you want two servers, that are connected to different switches, communicate. One way to do it is have a router between those switches - all traffic would go trough the router. Another way would be to connect the servers to multiple switches, so that they have a common switch.

But yeah, just imagine vlans as separate switches

SGC · December 22, 2020, 11:30am

yeah it seems to make pretty good sense… i just cannot figure out why i only get 170 mbit internet bandwidth, when i get like 700mbit if i connect directly… feels like i checked everything by now…

i figured it was bouncing back and forth over the cable and thus if it was going back over the same cable 3 times because i thought the server was asking the switch about where all vlan packages goes, even tho it should be able to switch them over it’s local vmbr.

and nor does it seem to be the case… took some time to try and dig more into it, as i didn’t really have any better information to give…

now i accidentally looked at the netdata while running an internet bandwidth test, and i had until now assume the server’s cable to the switch was the limitation… this doesn’t look like it tho…

maybe it’s time to tear it all down and build it from scratch, and then add the elements and features one by one to figure out what makes this happen…

going in on 1 vlan, hits the server and goes back to another vlan… shouldn’t really limit anything anywhere… aside from the fiber internet doesn’t seem to go past 600-700 in either direction…

even if we assume it was running half duplex 1gbit it would be 350mbit through put on the server… from routing from 1 vlan to the other… over the same 1gbit verified good cable… it’s just weird.

apparently my switch can figure out communication between vlan’s and i have to setup a special private vlan to completely isolate vlan ports into their completely isolated network.

and then it talks about 1 and 2nd tiers on the private vlans and how they will interconnect… or something.

anyways, back to hammering on the switch since it’s unlikely that it was actually the server’s fault… since the bandwidth of the cable and it’s NIC is like 1/3 or 1 /6th of capacity depending on duplex…

and tho the switch can switch at 16gbit internally thats still only 1gbit full between each port, so i can only assume the data somehow bounces around inside the switch multiple times due to my janky configuration…
but this is only my 2nd attempt at vlan, 3rd time’s the charm right … so yet another factory default coming up.

kalloritis · December 23, 2020, 3:23am

Has anyone asked @Alexey or anyone else at Storj if the storagenode process cares about atime being on or off at all?

I usually turn it off in fstab on my servers but noticed that I had not turned it off yet for the zfs dataset that storj is on (both tank and tank/storj have it on). I usually turn it off to free the drives from the metadata writes of atime for touches or reads for files I ~~don’t care~~ have other ways/use that metadata for timing.

Pentium100 · December 23, 2020, 4:33am

I have atime off on my node, seems to work fine.

kevink · December 23, 2020, 6:14am

it’s off on my nodes too. and it is best to have it off because updating the access time on reads means that the file is read to RAM and then written back to the disk. This is a potential risk for data corruption if your RAM experiences a bit-flip (if you don’t use ECC like me…). The risk might be really low but still, why risk it all? the last read access is irrelevant to anyone so imho it’s better to just leave the file on the disk without it being rewritten at any time. Therefore I disable atime on all my pools, even for personal data.

(at least that’s why I read in a zfs article when I set my pools up, can’t find it right now.)

SGC · December 23, 2020, 10:08am

to my understanding the only function of atime is tracking when a file last have been accessed, it’s an old security measure that, i think people say it’s outdated.

it shouldn’t have any effect aside from requiring a bit more iops due to it having to update meta data on files when accessed.
so if it’s small changes to a file it might be a large portion of the IO, while if you are overwriting a big file it will still just be one IO for the metadata block, while the file might be hundreds…

so it can cost double the iops in some workloads, and less than a 1% increase in others…
i also turned it off, tho i do kinda like it for some stuff… it’s a nice way to track if data that isn’t suppose to be looked at is accessed.

but for storj i would always keep it turned off… for other stuff i might not… but there maybe a newer solution, or computers using zfs went from being like a bank vault to a mall… and then the whole concept of atime grew useless and costly.

HelloWorld · October 5, 2023, 7:07am

Hi,

Was looking up some documentation about zfs.
You probably know this but my motherboard doesn’t support nvme and was thinking about a slog drive so here is an list of ssd who are “protected” against power failure…

https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Hardware.html#nand-flash-ssds

Pentium100 · October 5, 2023, 12:50pm

Basically, any “server” or “enterprise” drive should have those capacitors. Any “desktop” drive is likely to not have them.

HelloWorld · October 5, 2023, 1:05pm

MX500 1tb is 60$ is on the list. Regular desktop ssd. Difference its safety is in the firmware not the hardware…

Just saying its good to look on the list.

arrogantrabbit · October 5, 2023, 6:16pm

There are cheap m.2 → PCIE adapters on ebay under $10. No silicon involved – just straight PCB traces. Dumb like rocks.

Slog is only helpful for synchronous writes. They are a rarity. Unless you are running some extensive database applications or things like Time Machine, you’ll have very little sync writes. Storj usecase requires none of them (in the sense, that you can disable sync on the mounts storj is using with no repercussions)

SLOG does not need to be large. $10 16GB Optane is plenty overkill.

IN total, you can get SLOG for your array for under $20.

I think you shall do opposite route. See which SSDs you can get – and then check their data sheets for PLP. However, note that while Optane does not have capacitors – it is nevertheless fully protected due to its design.

This is incorrect. PLP is usually implemented by hosting spare tank capacitance to provide enough power to complete pending writes. There are always pending writes to NAND flash because of necessary batching – otherwise endurance and performance would plummet. So no, software can’t add capacitors.

I’ll reiterate my advice – use Optane. No capacitors, no problems.

HelloWorld · October 5, 2023, 6:34pm

Discussion of power failures bricking NAND flash SSDs appears to have vanished from literature following the year 2015. SSD manufacturers now claim that firmware power loss protection is robust enough to provide equivalent protection to hardware power loss

Anyway gonna use it as special device