Homebridging the alarm · Shai Yallin

My home alarm is a Pima FORCE panel, a popular Israeli unit installed in a lot of apartments and houses across the country. The installer paired it with a Control4 controller and pitched the combination as the all-in-one brain of the smart home; in practice we’re an Apple household and Control4 is, well, not. We ended up doing everything in HomeKit, and Control4 was left handling the two integrations HomeKit couldn’t reach: climate, and the alarm. Eventually I spun up a Homebridge container and integrated a plugin for CoolAutomation, a device that hooks into the VRF system to bring climate into HomeKit. The alarm was the last thing keeping Control4 plugged in.

Pima also ships a mobile app for the panel. I tried it. It’s bad. The Control4 app works, but it’s slow, sometimes confused about state, and not an app I’m going to dig out of a folder every time I leave the house. I wanted to use Siri voice commands to arm and disarm the system, and use it in automations like the rest of the smart appliances. It HAD to be in HomeKit.

I searched for a Homebridge plugin and didn’t find one. A few half-finished projects exist for older PIMA models, some Home Assistant integrations of varying decay, a forum thread or two of people asking the same question. So instead of contributing to something, I had to write something. Reverse-engineering a closed-protocol box dialling out from my own LAN is the kind of side project I enjoy more than I should, so I took it on. By which I mean, I asked Claude Code to do it for me.

Capturing the wire

The first thing I needed was the protocol. A Chowmain driver was already bridging the panel to my Control4 controller over plain TCP on the LAN, so the traffic existed; I just needed to see it.

My switch is a UniFi USW-Pro-8-PoE, which supports port mirroring. The first capture showed only my Mac’s own ARP requests. After half an hour of confusion I learned that on this switch, port-mirror destinations on the SFP slots silently don’t work — only copper does. Moved the Mac to a copper port, re-provisioned the switch, and the panel and the C4 controller started chatting in front of me.

The conversation was plain TCP, plain JSON, roughly one frame per message. The panel is the TCP client; it dials out to the C4 controller (or in my case, to my dev machine), and once connected it sends null heartbeats every few minutes and the occasional event frame:

{
  "frame_type":"event",
  "counter":42,
  "account":"1234",
  "type":401,
  "partition":1
}

That’s CID type 401, armed by a local user, partition 1. Someone armed it at the keypad. The receiver answers with an ACK, and the panel moves on.

Getting the ACK right

Claude’s plan was: Node TCP server, parse the JSON, send back an ACK. We had the wire shape on tape from the capture, so Claude went ahead and implemented.

The panel NAK’d us. {"frame_type":"NAK","data":"JSON frame"}, then sixty seconds of complete silence, then a redial, then another NAK. Claude diffed the ACK against the working one from the Chowmain capture and found three differences.

First, the ACK was missing a "kc":1 field. We had no idea what kc stands for (“keep connection”?), but the panel rejects any ACK without it.

Second, the field order matters. The panel doesn’t just parse the JSON; it appears to look for the fields in the order it expects them. The working ACK has account, counter, frame_type, kc in that exact order. Reorder them and you get NAK’d.

Third, and worst: the panel sends account as a string in its events ("account":"1234"), but the receiver has to echo it back as a number ("account":1234). We were passing it through unchanged, which meant sending strings, which meant the panel was NAK-ing us.

This is the shape that actually works:

export function ackFrame(received: PanelFrame): Record<string, unknown> {
  return {
    account: Number(received.account ?? 0),
    counter: received.counter ?? 0,
    frame_type: 'ACK',
    kc: 1,
  };
}

Every line in that snippet cost a sixty-second silence to find.

The plugin

Once ACKs worked, the rest of the protocol turned out to be small. A handful of CID event types we care about (130 burglary, 401/407 local/remote arm-disarm, 760 zone open/close, 770 output activate/deactivate), a few operation types for arming and disarming (12, 13, 14 for the three arm modes; 17 for disarm), and a parameter-query mechanism for asking the panel about installed zones and their names.

I followed the same playbook I used for Chronomatic and MIDI Set List: an in-memory fake for the panel, and an ATDD suite that exercises the plugin end-to-end against the fake. Pure protocol logic (encoding, decoding, ACK shape) lives in src/protocol.ts and has unit tests because it has zero I/O. Functionality is tested through a harness that boots a real Homebridge, hands it a real plugin config, opens a real TCP socket from the fake panel, and pokes the running plugin through Homebridge’s HTTP API the same way a user would. All logic for interacting with the alarm system resides in src/driver.ts, which is exercised by an integration test against the same fake, but without Homebridge in the loop, so it can be tested in isolation without the overhead of booting Homebridge and running the end-to-end suite. This is used mainly for asserting edge cases in the wire protocol, like the back-pressure issue we’ll describe below.

The plugin ended up exposing:

A SecuritySystem accessory per partition. HomeKit’s three armed states (AWAY_ARM, STAY_ARM, NIGHT_ARM) map to Pima’s Full Arm / Home 1 / Home 2; DISARM is DISARM. A per-partition config block can hide modes the user doesn’t use.
A typed sensor per zone. Same panel-side semantics for every zone, but the user picks contact, motion, leak or smoke and gets the matching HomeKit icon and automation primitives.
A global Switch for the siren. This was the one feature I asked for by name, after a water leak incident a few months earlier where the siren went off and I couldn’t turn it off without going to the panel and entering a code. The plugin is the only way to silence the siren without that physical interaction, so it has to work.
State sync from any source. If someone arms the panel from the keypad or the Pima app, the HomeKit tile updates within a second. The plugin doesn’t assume it’s the only thing talking to the panel; it listens to type 407 (remote O/C) and 401 (local O/C) events and updates the SecuritySystem accordingly. A HomeKit accessory that disagrees with reality is worse than no accessory at all.

A representative test from the end-to-end suite:

it('UI SecuritySystem AWAY target sends an AWAY-arm OPERATION', async () => {
  using alarm = await harness.connectAlarm();
  const partition = harness.homebridge.partition(partition1.name);

  await partition.setTarget(DISARMED);
  await partition.setTarget(AWAY_ARM);

  const op = await alarm.nextOperation({ optype: OPTYPE_ARM_AWAY, partition: partition1.id });
  assert.equal(op.partition, partition1.id);
});

No setTimeout, no await sleep(2000), no flake. nextOperation returns when the matching frame arrives and rejects when it doesn’t; setTarget resolves when the SET round-trips through Homebridge’s HAP IPC. The test reads like a script: poke a button, expect a frame.

A bug the tests missed

The system works. I installed the plugin into my Homebridge instance, pointed it at the real panel, and it arms and disarms on voice command. The zone sensors update when I open and close the zones. The siren turns off when I toggle the switch. It’s a success. But I soon discovered that the arm state of partitions was not being synced on Homebridge boot. If I armed the panel, then restarted Homebridge, the partition would show as disarmed until I toggled it from HomeKit or the keypad again.

The bug surfaced when I asked Claude to sync the partition states on boot. The naive implementation fired three “get partition state” DATA-REQ queries back-to-back, one per partition. After deploying, partition 1’s state would land in HomeKit; partitions 2 and 3 would stay on whatever default Homebridge had cached, the log showed NAKs and the alarm system disconnected and reconnected - which triggered another sync, creating an infinite loop of failed syncs and reconnects. However, all tests were green.

The debug log showed what was happening on the wire; The panel answered the first DATA-REQ and NAK’d the other two with counter=0 and data:"JSON frame", the same misleading rejection string I’d hit a week earlier with the malformed ACK. The panel firmware processes one Home Automation request at a time; a new one mid-flight gets dropped on the floor.

The fake didn’t enforce this. It happily answered any DATA-REQ at any time, in any order, with any concurrency, because we’d written it as a passive responder rather than as a model of the real device. So the tests passed.

This is precisely the failure mode I’ve been warning clients about for years. A fake is only as good as its fidelity to the real thing, and the way to keep it honest is a contract test against the real implementation. We had no contract test. We had a fake, written from Claude’s own reading of the protocol, drifting quietly from the actual hardware.

The fix had two parts. First, the fake had to model the back-pressure:

if (frame.frame_type === 'DATA-REQ') {
  if (inflightQueryCounter !== null && autoReject.racingDataReqs) {
    // Mirror real-panel back-pressure: NAK counter=0 "JSON frame".
    write({ frame_type: 'NAK', counter: 0, account: accountString, data: 'Invalid JSON frame' });
    continue;
  }
  inflightQueryCounter = Number(frame.counter);
  /* … */
}

With the fake telling the truth, the partition-state-on-connect tests turned red. Now we had a failing test. The second part was a single in-flight slot in the driver, serializing every outbound DATA-REQ and OPERATION against the previous one’s settlement. Twenty lines of code. The bug had been live for two releases.

Sadly, writing a contract test against the real alarm system is impractical, as it would entail me entering codes and toggling the siren every time I ran the suite. But the packet capture script is a good second-best, and it caught the regression immediately on release.

Refactoring the tests with an AI pair

The end-to-end tests I quoted above didn’t always look that way. An early version of the same kind of test:

it('UI SecuritySystem DISARM target sends disarm OPERATION (optype=17)', async () => {
  const alarm = await fix.connectAlarm();
  try {
    alarm.send({ frame_type: 'null', counter: 40, account: String(fix.account) });
    await alarm.waitForRx(1);

    const partition = await findAccessoryByName(fix, 'E2E Partition');
    await fix.api('PUT', `/api/accessories/${partition.uniqueId}`, {
      characteristicType: 'SecuritySystemTargetState', value: 1, // AWAY_ARM
    });
    await new Promise((r) => setTimeout(r, 100));
    const since = alarm.received.length;
    await fix.api('PUT', `/api/accessories/${partition.uniqueId}`, {
      characteristicType: 'SecuritySystemTargetState', value: 3, // DISARM
    });

    const deadline = Date.now() + 5000;
    let op: Record<string, unknown> | undefined;
    while (Date.now() < deadline) {
      op = alarm.received.slice(since).find((f) => f.frame_type === 'OPERATION' && f.optype === 17);
      if (op) break;
      await new Promise((r) => setTimeout(r, POLL_INTERVAL_MS));
    }
    assert.ok(op, `no DISARM OPERATION received; got: ${JSON.stringify(alarm.received.slice(since))}`);
    assert.equal(op.optype, 17);
    assert.equal(op.partition, 2);
  } finally {
    alarm.close();
  }
});

This is the same test as the clean version above, written through twenty-five lines of plumbing: hand-rolled polling loops, raw HTTP PUTs with magic numbers for HomeKit enum values, a setTimeout(100) to wait for the SET to settle because Claude didn’t have a real synchronization primitive, a try/finally to close the socket. Every test in the suite looked roughly like this. The duplication wasn’t malicious, it was just the natural state of tests written by an AI agent according to its training. It was actually pretty good for what it was, and it got the job done, but it was unreadable and would have been a nightmare to maintain if I ever had to come back to it.

In the pre-AI era, I would have looked at this thirty-test file, made a mental note to refactor when I had time, and shipped it. Refactoring test code has clear, immediate value and zero glamour. It doesn’t show up in a release note. It doesn’t fix a bug. It just makes the next test cheaper to write, and “the next test” is always somewhere in the future.

Instead, I told Claude to index other projects of mine and come up with a set of best practices for test infrastructure. It came up with a test-support/ directory with domain verbs; an eventually primitive to replace the polling loops; a closure-based driver for the alarm fake exposing verbs like nextOperation, verify, report, respond; and a Homebridge UI driver with sub-drivers for partition(name), zone(name) and siren(name). Tests migrated one describe block at a time, with the suite green at every step. We did roughly thirty tests over an hour. The driver test file shrank from 775 to 620 lines. The number of raw alarm.write('{"frame_type":...}') strings dropped from twenty-five to three (the three intentional ones, for protocol-violation tests). Every await new Promise(r => setTimeout(r, ...)) was gone.

I want to be careful about the framing here. There’s a lazy version of “I used an AI” that means generating code and pasting it in, and that isn’t what we did. I knew the shape I wanted; I’ve designed test harnesses like this for clients, on projects from React Native to Go to Flutter. What Claude did was carry the mechanical translation load: it migrated thirty tests, kept the suite green at every step, and let me focus on the design decisions about which verbs belong in the driver, which things are getters and which are actions, where the cleanup boundary sits. The hard part was still mine. The boring part was no longer mine.

A senior engineer with a junior pair would have produced the same refactor in 10x the time.

What made any of this possible

This is the part where I have to credit the feedback loop, because Claude could just as easily have made a mess. The protocol layer had unit tests, the plugin had an end-to-end suite and the real panel was on my LAN for final live verification. Every change went through that loop. Every regression I’ve described in this post — the fake that didn’t model back-pressure, the silently-lost partition states, the back-to-back DATA-REQ race — was caught and pinned by a failing test before the fix landed.

This is the executable law I wrote about a few months ago, and the project would not have shipped without it. A rule in a prompt is a suggestion. A failing test is a wall. Claude was sometimes brilliant, sometimes sloppy, occasionally hallucinating about an API that doesn’t exist; none of it mattered, because what made it into a release was the version where the suite was green.

The plugin is on npm and the code is on GitHub. Total time spent: one day plus some evenings. Number of lines of code I typed myself, excluding commit messages, is in the low double digits. I don’t really know how the Homebridge part looks, nor do I care. I can read the test, and I can observe the real-life behavior, and that’s enough. It’s still my plugin. It’s just a plugin I would not have written on my own.