High Availability Considerations for SNO’s
I’m sure all SNO’s strive for their storagenodes to have the best uptime possible, which can be quite a challenge working on a limited budget with limit hardware, these are some of the considerations, implementations and compromises i’ve made in the setup of my server.
Electrical supply, i know my supply is highly stable to the point where my only considerations are getting some mains filters and fuses to ensure against overvoltage and the likes.
stable power is critical for reliable operation of most electronics.
Another thing i have considered is a small UPS type solution for safe shutdowns of the server in case of power outtages, but since i’m using a Copy on Write filesystem, i should see minimal effect from power losses and so there ill leave that to be better explain by those actually requiring and or utilizing those power stabilization technologies.
And don’t forget grounding, i never really grounded my consumer grade computers and never really had much issue with that, but it can be a critical point of trouble which will rear its head in the weirdest of ways, especially when dealing with long copper based network cables, multiple power supplies and various devices hooked up right and left, at best it’s errors, at worst something will give up the magical blue pixie smoke… alas moving on
Power supplies, i’ve dealt with a lot of bad power supplies, ranging from the cheap which would act up at any sign of electrical noise… i’m also told that a large portion of harddrives often fail due to bad power supplies, your power supply or PSU is the last line of defense for your computer, it is the component that keeps everything alive and running smoothly, having a problem here or being cheap rarely pays off in the long run, some servers do have multiple PSU’s because they can wear out, especially if they are overloaded, loading any PSU beyond 75% of it’s capacity shouldn’t be recommended, it’s rarely worth it… tho i have had expensive power supplies that just wouldn’t die…
running at 125% of rated wattage, burning them to nearly a crisp, running with broken fans and what not… it’s quite impressive what well designed gear can take.
Many HA servers will have dual PSU’s allowing for zero DT hotswap, which is nice, but also you will have a second PSU increasing the expense of gear and wattage consumed, i’m not aware how bad the added wattage use actually is, but i’m confident that it’s not zero… but if i was to hazard a guess maybe 25-50watts extra power draw for a dual PSU solution, and if you got a quality PSU running at 60% or 50% capacity, then PSU failure seems to be very rare cases in my experience.
Networking__________________
A few ground rules… if you are utilizing a UPS solution or other mains power filters, relays, fuses and such to protect your gear from electricity, then this is yet another critical place to pay attention.
When you are attaching long copper cables to anything you will have voltage differences… basically there is more energy in one place than another and it will crawl across metal, wires even materials that you might consider non conducting like paper, air and plastic will often allow the flow of electricity.
ofc like most know some metals like copper are the worst, which is why we use it for moving electricity, but this means any electricity… meaning if the wind blows over your house and your network cable, phone wire or such goes far away, you can in mere seconds have voltage differentials of thousands of volts, it might just give you a static jolt when it pierces your skin, and tho your NIC …
Network Interface Card is usually pretty well made to deal with such things… then stuff like a lightning strike means all bets are off, it will fry you UPS protected, no matter the surge protectors and filters you have on it…
the network is you ingress point of not only data, but also a secondary path to killing your server…
the only true way to mitigate such an issue is wireless, fiber optic and such non conductive solutions.
i know many will frown upon me saying wireless… but wifi isn’t always bad… it really depends on how much radio noise or distance you have (lets just call it radio, this is getting long enough as it is… ) , what kind of walls and in the way… so in some cases wifi maybe a cheap solution while in others it will be totally useless… fiber is the preferred professional solution, if possible ofc… there are ofc a plethora of different variations of these technologies, but often it always boils down to those two…
.
i must admit i ended up pulling a TP cable for running 1Gbit… almost without thinking ahead…even got cable good enough that i can do 10Gbit and most likely 40Gbit,
alas in 2020 hindsight i should have setup a fiber connection and then had two switchs with fiber uplinks in either end, would have been the sensible solution… most likely also cheaper with fiber, because 10gbit + “ethernet” as most call it, but it really isn’t named, ill call it twisted pair because a better definition escapes me atm… is simply ridiculously priced, so you are most likely better off avoiding it and doing some fiber uplinks, maybe use multiple 1gbit connections… most switching gear can handle many many gbit, so the cheap way is to hook up a 4x 1gbit connections into the switch and then uplink that away over 10gbit fiber … not the best solution… but it works if you got the network gear for it.
ofc it won’t cut down on your power bill i bet… but it’s an easy patch and really how often do people need more than 4gbit bandwidth.
this also allows you some failover in case a NIC goes bad, but again pretty rare on quality gear…
we will touch more on network failover when we get to the actual configuration of the HA server itself.
The HA Server Hardware.
Servers are generally build for specific purposes, much like your regular consumer computer, there will be all kinds, just like a car, many purpose built but still kinda the same,
Most server will have some integrated HA solutions that the Enterprise and Prosumer have demanded over the decades since the rise of the internet… and others may be designed with HA in mind…
generally HA boils down to having redundancy and quality which in the long run means that the system will end up a practically useless zombie system that only collectors will want, if the system if of proper quality in the first place… and isn’t replace for other reasons before that time…
Ill be describing the HA build into my 2U rack mounted server, in general terms… first off lets stick to going through this systematically and work our way from the outside in…
Cooling, most of a server’s component will have passive cooling radiators or fins if you like… thus basically maintenance free, the air flow is then provided by a setup of 3 to 5+ powerful fans that are easily replaced, because fans have a tendency to wear out… especially when being run at near their max recommended speeds… with 3-5 fans you can afford to loose a few without the system ending up no cooling and thus providing a stable cooling platform.
RAM usually servers utilize ECC RAM which which enable them to cheaply and quickly perform scrubbing of the memory, this feature corrects bit flips and such corruptions of memory data which can happen from time to time, this also allows the server to basically disregard an entire memory block either ram module should it go bad, not really that familiar with that feature, but i suppose there must be a spare of whatever the thing is… yet another redundancy feature to allow for increased HA.
on top of this then comes whatever else options you set in your bios configuration, this is also where the scrubbing features are turned on and off, scrubbing will take some work to do and thus for high performance servers people might turn it off, in the case of SNO’s the general recommendation is Patrol scrubbing which periodically perform a scrub, other options are demand scrubs which will scrub any accessed data when the data is requested, thus giving an even larger performance penalty, this might be relevant for some workloads, but is in the most practical sense irrelevant for SNO’s, so leave that disabled.
On top of all this there is the RAM spare function, this is basically the RAID 5 of RAM, i assume the ECC features also play a part in this, but that’s not really relevant for us to know… meaning i don’t think you can run this feature with regular RAM, the spare function basically makes so that the ram modules located on 1 channel is a spare… and using parity math to be able to recalculate lost data… or that’s how i think it works… sadly this means that one will always sacrifice 1/3 of the installed memory capacity for this functions… but if i choose to run this i should be able to hot pull / unplug while system is running 4 of my 12 RAM modules and the system wouldn’t care… aside form start howling with alarms and some decreases in overall performance…
personally i think this is overkill for a SNO, but if your system is located in a difficult to reach location, or your travel a lot, it might a good choice… costly feature tho, ofc it also provides yet another layer of redundancy against data corruption, but i never had much trouble with RAM so … disabled it is for me.
might enable it if i left for a month tho…
Networking HA Segment 2
Most server have at the very least 2 NIC’s often located on two different chip or chipsets to provide yet another failover option for the correctly configured server, in my case i got 4 NICs 2 on their own chip and one on it’s own chipset… in more modern servers one would most likely also have optical, but my server is simply that old… lol
One can aggregate two connections into one, with options like microsofts multiplexor functions or aggregate in linux, these options are pretty “easily” configured these days, but when it comes to load balancing and such not always optimal, kinda depends a bit on the luck of the draw, haven’t been very impressed with microsoft’s multiplexor or whatever its named… personally i would recommend a feature that keeps an eye on the connection and if the connection is down, then it will try to utilize the NIC located on the other chip or chipset… in case the chip / chipset is damaged or otherwise inoperable…
but in theory the aggregated load balancing NIC’s is preferred because this will double your bandwidth in and out of the server while giving you the same redundancy, when it works correctly… in some cases when poorly configured the 1 NIC going down, will then affect the connection of the other, which basically just means instead of halving your potential for NIC downtime, you essentially doubled your odds of something going wrong… so be sure it works like it’s suppose to when using these loadbalancing multiple NIC solutions.
Using a local DNS name for the server will make sure that no matter the IP the server has the data will be routed to the correct IP for your storagenode, ofc this is also highly dependent on your access to your router/dhcp server and or it’s support for utilizing routing to DNS names.
can be rather useful when dealing with loadbalancing and failover since in such cases ip addresses might be a bit in flux.
CPUs
The CPU in my experience rarely goes bad, kinda like RAM, modern chip technology is pretty reliable, ofc i will assume that nobody gets the first generation tech, because new stuff will always have gremlins, so look out for that… if you are thinking in HA terms or simply like your tech to work…
corporations have a tendency to utilize end users as beta testers, even tho this is sort of a bad for business, it can be difficult to simulate tests on a wide enough scale to avoid this…
so assume 1 gen of new tech will be buggy and avoid if possible.
“Many” server will come with 2 CPUs and tho this is mostly for added performance, it can also provide additional redundancy for your HA system, granted this is not always be the case, so be mindful of that and if this is a consideration for you, then make sure each CPU connects individually to the rest of the system, also this doesn’t make either CPU redundant, they will work in harmony with each other sharing RAM and data between each other, however in the case of one of them critically failing then a hard reboot should bring the system back online in most cases… tho this does require some things like NUMA and maybe special RAM interleave configurations , so that the system has permission to assign some RAM with individual cores and thus basically creating a virtual computer within the system itself, ignoring all the other components it’s not talking to, which in that case essentially could be disregarded and this if corrupted or down, they wouldn’t keep the rest of the system down…
which leads us to the next logical step of HA
BIOS Options and Configuration.
I will here be going over some of the things we have already gone through, but for future searching and partially lookup of good procedure., i will briefly go over them again.
The Watchdog, any HA minded Server will come with a HW watchdog, the dog is feed by something, in my case the OS, but it’s very easy to adapt it to be feed when something is running… if the dog isn’t feed it will perform an action… such as a hard reset of the system, turning it off and back on again… or something in that regards…
there are usually a few of these options… not sure what is best… i would prefer a hard reset, so that power isn’t cut to the system… but in some cases this might be wanted… but if we imagine the system running into some sort of issue that won’t be solved… then with a power cycle it will sit there turning itself on and off… maybe every 2-5 minutes… so 12 times in a hour so 288 times in 24 hours…
so lets imaging your unable to get to it in less than 36 hours… it will have spun the hdd’s up about 500 times, while on a hard reset… they will just have been spinning idle which would then not be that bad… the tiny motors in a hdd suck down 10 times the power on spin up… and this is a critical point of failure, but i suppose this really belongs in the section relating to the hdd.
but wanted to give you an example to why i prefer a hard reset to a power cycle.
as it could quite possibly break your machine for no good reason at all aside from a bad configuration, while on hard reset it will at worst have down time…
The one major disadvantage with the hard reset is that some things that a full power down could solve, would not be solved…
However i digress…
Watchdog will basically reboot your system if any issues arise where the BIOS (which is essentially a computer system within your computer system) looses contact with your operating system or otherwise and thus resets the system… this would also work in case of CPU failures or issues like touched upon previously in the CPU section.
There are also configuration to split your QPI (quick path interface i think it is…) up from 20bit across all paths to using each channel individually at 5bit, this would in such a case protect against lanes failing, i haven’t set this up myself, but might test it out in the future… but lets be honest… its basically a bus… whats a bus… well its a big thing you put smaller things into or onto…
basically in this case it’s basically wires… which i would assume is inside the circuitry of the motherboard… so yeah… if you have your computer running on the northpole and can only come repair it every 6months… maybe i would use this…
but it’s yet another HA option that intel in this case provides… not very familiar with AMDs stuff here… i’m sure it’s good… kinda… i like that they sort of just said… hey lets just print more cheap die’s and slap them onto big chips so we can get ridiculously many cores… and intel was like… YOU DID WHAT!!!
But AMD is still kinda the underdog, in many other aspects, so for HA i would stick with Intel or better.
50k $ powerpc chips like NASA usually uses…
wait what was i talking about again…
BIOS right…
compiling this has become a bit of a project, so lots of distractions, long pauses, multiple days and a bit of research going on behind the scenes, but mostly this is of the top of my head, and relies on my understanding of these topics… and i would ofc like this to be as accurate as possible, so if you think you can empirically prove something wrong, then ill be willing to look into it further.
like most people i know that one has to be more wrong, to be right… xD
Back to BIOS
this should really be a part of the watchdog thing, but then i have to start copy pasting…
Restore on AC Power Loss or so it’s called in my BIOS, might have many different names, i forget.
Usually for this i like to use Last Power State… nah Last State its called in this case atleast…
this does so that if you turn off the server it won’t spring back to life if you have a power outage, disconnect and reconnect a power cable or what not.
which is kinda nice… and it will also spring back to life after a power outage if it was turned on when the power went out… so thats pretty nice… not a big fan of the others… i was running default on right now… but i actually thought that would keep turning the server back on if was turned off… but from how the name on the bios option sounds, i kinda doubt it now… maybe i should have read that when i set it… lol ill give that a test soon thats for sure… i’ve had this issue with the server just shutting down randomly turned out it was some sort of power conservation feature that i had turned on… but to try and remedy it i figured it would try to other option than Last State which has been my preferred BIOS setup for maybe a decade now… if i run through a bios i will set it to that, because why not… its the most sensible option…
Boot devices…
I would recommend setting your Primary Boot drive and your Secondary Boot drive and whatever else redundant Boots you might have… but personally i will stick with Primary and Secondary.
these days i split them so one is on the HBA and one is on the onboard mobo SATA/SAS controller.
that will cost some extra bandwidth on the bus, but if either controller fails then the secondary will pick up during reboot, the main reason to disable all the others are if you add a drive during your uptime, the boot sequence can get displaced and the system might be unable to Boot if it crashes.
thus i would recommend booting primarily off the onboard motherboard controller and then from an HBA based drive in case the other one fails.
Personally i like to have a boot drive not located on the HBA, basically anything directly on the motherboard, i also don’t like booting on RAID arrays in case i have to find and correct issues with the RAID, thus the RAID can fail without the OS being affected, running an OS on a mirrored array can be a very good idea, this gives you a few more options for added redundancy, also if you do setup a mirrored array across different controllers.
(note that a mirror array isn’t raid… the system / controller just copies / mirror / clones the data on either drive, so one drive will work fine… however one thing to keep in mind here is that a bad drive in a mirror can great decrease performance…)
and remember to enable NUMA and Patrol scrubbing (if you have ECC memory)
I’m sure there are lots of more detailed advice for this, this is just the things i have learned to account, so thats my recommendations for HA considerations.
Storage.
I’ve chosen to go with ZFS, for my storagenode / server and i would recommend it for anyone that doesn’t mind stuff being quite technical, for those of you that want to keep it more easy to manage i would go with a Raid6 sadly you will need 5 hdd and more like 8 else you are most likely better off just running multiple nodes, but that doesn’t really make them very HA, you could do a raid5 with 3 hdd, but raid5 is quite flawed and really your safe choice is either mirrors or raid6, and then you need to be sure your raid controller either has a battery or a flash memory to protect against write holes from power outages…
something ZFS solves by being CoW Copy on Write… basically it has some pointers, and it doesn’t over write data… it copies or added the new data and then finishes off by correcting the pointer… thus if you loose power in the middle of something… the pointer isn’t updated and points to the old file… and thus you lost data… but you didn’t corrupt your data… which is the future of any file system, everything less is simply archaic by now.
after long study of raid i would say a raidz2 with 8 drives x2 is the array i would recommend for a storagenode pool and anything less… well sorry might not it or certain isn’t HA
ofc we live in a world of compromises and i an currently running a pool of raidz1 x2
x2 for the double hdd IO and raidz1 for some redundancy… but raidz1 is kinda dangerous, so i wouldn’t recommend it… but lets leave it there…raidz2 with 8 drive should be much more redundant than any well monitored system needs to be… ofc i’m only two months into using zfs… and a few years into really using raid… so not really my place to tell you raidz1 is safe… even if i kinda think it should be…
but raidz2 is quite safe… so lets call that HA for a storagenode, this also buys you some time to replace a drive… even tho you really shouldn’t… a broken drive should trigger a global hotspare resilvering.
anything less is asking for trouble, and if you don’t replace a failed drive you are asking for your array to fail… plain and simple…
he said without a hot spare for his raidz1… xD
Alarms and Monitoring. - the death of hardware…
This is turning out to be a bit more extensive than i first assumed…
seems like we have finally come to the conclusion of hardware considerations,
however a proper HA hardware setup is only half the battle. if even that…
this is a war on downtime and really the primary causes of downtime are the things
we didn’t account for, we also have the more external environment of the server to deal with.
as these can be just as contributory in your overall system downtime.
however inside the system we have many different components that needs to run like clockwork,
and to be sure they do some, we will need to have some monitoring of these components.
we want to monitor CPU temps, HDD temps, latency, fan speeds, hdd smart, raid array status.
these values we will log for future reference so we can troubleshoot, and tho this logging isn’t strictly
required it can be very useful for attempting to predict something like disk failures when noticing their
temps seems to be increasing outside the usual temperatures or into non recommended levels.
that way we gain the ability to predict potential problems ahead of time, such as dust filters being clogged and system gaining in temperature.
ofc it’s impossible to keep track of all this which is where alarms come in… when we have determined acceptable tolerances, we setup alarms which should be either emailed or sent by SMS preferably from a remote system concurrently tracking internet downtime and the likes.
tho these features can ofc mostly integrated into the system and then downtime tracked by some other service… we want notifications of unwanted behavior, unscheduled reboots, and such, but we don’t want so many notifications that we end up ignoring them, alarms are only worth anything if we listen to them.
also it can be easy to actually make the system so redundant and HA that it will just keep spinning without us having to do anything, but when the redundancy is worn down it will eventually die hard, if we don’t have proper procedures in place.
HA and automatic updates…
i know storj promotes automatic updates and the system seems to run fairly smooth, but it’s difficult to argue against that systems rarely get unstable on their own… it is most often related to updates or updates failing
Segments being added in the future…
snapshots / bootloader
OS
(revision 1.0 - sorry this is still a bit of a mess, ill attempt to make it a bit easier to get the just of it while scrolling through it)