PSA: NUT, CyberPower UPS, and FreeBSD

If you are not running storagenode on a FreeBSD system, connected to a UPS (especially, CyberPower one) managed by NUT – this does not apply to you, no reason to continue reading.

The problem

This is a “perfect storm” of circumstances that may result in data loss as a result of power getting yanked before the system is ready, when UPS battery depletes.

  1. NUT is designed for linux, where by the time shutdown -h commands completes, the filesystem is synced and re-mounted readonly, at which points it’s safe to pull power. FreeBSD does not re-mount filesystem, and the kernel takes care of flushing caches and finalizing the filesystem after the shutdown command completes. So when NUT shuts down the UPS power as soon as shutdown command completes – this results in an unclean shutdown. There is parameter offdelay that defines a delay in seconds between the UPS kill command is issued and power is pulled. By default, it’s 20 seconds, and unfortunately, this is not enough for most servers.
  2. CyberPower UPSes, in particular, treat the offdelay parameter differently. They convert it to minutes and round down. Hence, the default 20 second value in CyberPower’s world means zero. It must be overwritten to at least 60, to prevent the UPS from yanking power immediately.
  3. Separately, relying on UPS’s critical battery alarm to shut down the server is a non-starter: most UPS, especially when the battery deteriorates, do not provide enough runtime to safely shut down the server. Most available Cyberpower models don’t provide ways to calibrate the battery automatically and/or periodically, and when I contacted support asking how to do it, the rep spent 10 min searching for something and then claimed “this information is proprietary” and hung up. Therefore we want to configure shutdown by remaining battery percentage or runtime thresholds manually.

The solution

Add the following to ups.conf

Configure runtime monitoring

Ignore low battery state reported by the UPS and instead and instead go by remaining runtime and state of charge. In this example, we set 20% low battery charge, and 5 min remaining runtime, whichever is lower.

ignorelb
override.battery.charge.low = 20
override.battery.runtime.low = 300

CyberPower-specific offdelay override

CyberPower UPSs divide the value by 60, rounds down, and use the resulting number of minutes as a delay. In addition, on some models, the ondelay parameter must be set to zero to ensure proper power-up behavior. (what a crock of shit are those devices… No more buying them. But they are cheap)

ondelay=0
offdelay=120

These values result in 2 minute power off delay (120/2), and 10 seconds (internal UPS default, evidently, on my specific model) power-on delay, when power is restored.

helpful references

3 Likes

I asked them what’s the recommended battery replacement interval, if you have verry rare or no power outages. They said 2-3 years. So don’t expect it to be like a car battery, 5+ years.
Even the APC has similar intervals. I tasted the unexpected shutdown of APC Smart UPS 750 after 2 years, with a power draw of 50-60W.

Another thing… why set ondelay to zero?
If the power comes back and after a minute or less goes down again, because Johnny playes with switches at the power plant, it can catch your OS midle startup, or nodes just starting up and update or something. If the UPS battery is already depleated, it will be a sudden shutdown for both, the UPS and PC.

I would put at least 10 min on ondelay, at least the battery can suck up some power, and they have time to smack Johnny on the head.

I switched to Cyberpoewer only because they are the cheapest sinewave options and work, are not brandless chinese crap. They are not perfect but are more than OK. The problem I have with them is the lack of replacement batterys.

2 Likes

Yes, these batteries don’t last. They are always overcharged and rarely discharged. I replace them every two years. You can get AGM batteries optimized for ups use, from companies like powersonic, but it does not drastically change longevity.

Still, an ups is expected to run calibration cycles periodically to adjust battery runtime prediction. APCs do. CyberPower ones, in the same price range at least — don’t. So you can’t rely on their low battery alarm at all. That’s the reason for the first change.

Because if you set it to anything else it does not work at all. It’s a cyberpower bug. With it set to 0 you get 10 seconds poweron delay. If you set it to anything else — ups misbehaves. For example, this one does not powecycle the load if power returns too soon. It’s clearly software bug, but they clearly don’t give a shit, because it’s there for years. Hence, the second change above. If you get power back shortly after ups discharges — well, that sucks.

Exactly. But amount of time it took me to figure this shit out completely wiped out any savings.

My advice — stick to APC. My old 18 year old APC SmartUPS still works, and works correctly, with all the right behaviors on power restore and timed power off. They still sell it pretty much unchanged, and it’s 3x more expensive than comparable cyberpower, and is worth every penny. (It still requires to increase the power off delay though, due to FreeBSD shutdown behavior difference)

1 Like

I wonder if there is anything else worth the money, besides APC? I know about Eaton and the wellmarketed chinese TED, but I don’t have any expirience with these.

I do not think so. Learned in a hard way.