Checking the foundations when you'd rather keep building

The uncomfortable decision to dig up the floor when the building is almost finished.

BUILDING BEHAVIOURKIT

Lauren Kelly

10/10/2025

I did something this month that I'd been avoiding. I subjected the entire BehaviourKit database to a full construct integrity audit. Not a route audit. Not a feature check. A foundational audit that asks: are these building blocks actually sound enough to carry the weight of an automated system?

The timing is inconvenient. The contradiction matrix is built. Quick Start is designed. The evidence base is growing. There's momentum. And instead of riding that momentum, I'm stopping to dig up the floor.

Here's why.

When I used the system in workshops, I was the quality control. If a driver definition was slightly fuzzy, I'd interpret it correctly in the moment. If two constructs overlapped, I'd route around the overlap based on context. If a mapping was plausible but not precise, I'd adjust my recommendation intuitively. The system didn't need to be perfect because I was there to catch imperfections.

An automated system doesn't have that luxury. Every fuzzy boundary becomes a potential misroute. Every overlapping construct becomes a place where the system might diagnose the wrong thing. Every mapping that's "close enough" becomes a recommendation that's "close enough," which in practice means sometimes wrong.

I set the audit up with a deliberately high standard. The question wasn't "is this content good?" It was "is each construct behaviourally meaningful, clearly separated from its neighbours, and defined precisely enough that two different practitioners would classify the same case the same way?"

The audit came back with a verdict that I found both reassuring and uncomfortable: "The system looks extensive and thought-through. The problem is not that it lacks structure. The problem is that some of the structure may be giving a false sense of clarity."

The biggest finding was about the driver layer. Twenty-five drivers, sitting in one flat table, all treated as the same kind of thing. The audit pointed out, correctly, that they aren't the same kind of thing.

Some are genuine behavioural drivers. Person-level mechanisms like confidence, memory, habit strength. Things that happen inside someone's head and directly influence what they do.

Some are structural constraints. System-level barriers like regulations, resource availability, physical structures. These aren't about what's happening in someone's head. They're about what the environment allows or prevents.

Some are social signals. Group-level influences like norm visibility, group uptake, identity fit. These operate through social dynamics, which follow different logic from individual cognition.

In a facilitated workshop, mixing these together in one grid works fine. You mark them all red or green and discuss. The facilitator knows that "Regulations" requires a different kind of intervention from "Confidence," even though they're both in the grid.

In an engine that needs to route automatically, the mixing is a problem. A behavioural driver and a structural constraint need different lever families. A social signal and a contextual condition need different evidence standards. Treating them all as peers in a flat list means the routing logic can't distinguish between fundamentally different types of problem.

Seven years of building. Thousands of hours of cataloguing, naming, sorting, connecting. And the foundations have cracks in them.

Cracks, not collapse. The content is mostly sound. The constructs are mostly real. The connections are mostly defensible. But "mostly" isn't good enough for an engine that will make recommendations without me standing there to catch the wobbles.

Specific drivers were flagged as higher risk. "Attitudes" was doing too much work under one name, collapsing evaluation, preference, willingness, and value judgement into a single entry. "Willpower" carried a folk-psychology quality that made it unreliable for consistent diagnosis. "Support from Others" blended encouragement, practical help, accountability, and social reinforcement into one construct. These aren't bad ideas. They're constructs that are too broad or too fuzzy for the precision the routing engine needs.

The repair process is painstaking. I'm working through the drivers in batches, starting with the strongest constructs to establish a quality benchmark, then using that benchmark to assess the weaker ones. Each driver gets typed (behavioural driver, structural constraint, social signal, or contextual condition). Each one gets an honest state model. Each one gets a boundary note explaining what it's most commonly confused with. Each repair is checked against the neighbouring constructs to make sure tightening one definition doesn't create a new overlap somewhere else.

It's the right thing to do. And it's the most uncomfortable phase of the entire project. And it's take a lot of brain juice to not get wrong.

Go deeper into the Building BehaviourKit series: