The problem a team of one can't brute-force

How AI became the quality layer for a system too complex for one person to hold.

BUILDING BEHAVIOURKIT

Lauren Kelly

1/10/2025

BehaviourKit has a complexity problem, and it's the kind that working harder won't fix.

The system now has twenty-five drivers. Each one has a definition, a low-state description, a high-state description, diagnostic questions, linked mechanisms, and connections to levers. Each lever has its own definitions and links to tactics. Each tactic connects to specific plays. Each play has conditions for when to use it, when to avoid it, supporting evidence, and template copy for implementation.

That's hundreds of interconnected data points. And every time I change one thing, I need to check whether it's broken something three layers away.

For six years I've managed this by holding as much of the system in my head as possible and working through it piece by piece. Update a driver definition. Manually check whether the lever connections still make sense. Read through the linked plays. Spot an inconsistency. Go back. Fix it. Move on. Check the next connection.

It works. It's also incredibly slow, and I'm increasingly aware that my brain is not a reliable database. I forget which version of a definition I settled on last month. I lose track of which driver-lever connections I've already reviewed. I fix something in one sheet and don't notice that the same concept appears in three other sheets with slightly different wording.

This month I started working with AI as a building partner, and I want to describe what that actually means, because it's different from what most people assume.

It's not about speed, although things do move faster. It's not really about making my implicit knowledge explicit, although that happens along the way. And it's certainly not about the AI knowing behavioural science better than I do. It doesn't.

What it's about is system complexity.

I input the rules. The behavioural logic. The ontology. The relationships between constructs. The conditions under which a connection is valid. The constraints that should prevent certain routes. All of that comes from me. From six years of cataloguing, pattern-finding, workshop testing, and evidence gathering. The AI doesn't generate any of that knowledge.

What the AI does is play those rules out across the entire dataset and tell me where things don't hold together.

Here's a concrete example. I have a rule: if a driver is a structural constraint rather than a behavioural driver, it should route to a different family of levers. I know this rule. I wrote this rule. But checking it manually across twenty-five drivers, each with multiple lever connections, each connection with its own rationale and conditions, is a full day of focused work. And I'll still miss something, because by row eighteen my attention is drifting.

When I give the AI the rule and the dataset, it checks every row, flags every violation, and shows me exactly where the logic breaks down. In minutes rather than days. And without the attention drift.

That's the real value. Not that the AI thinks for me. That it lets me think across a system too large for one person to hold at once.

The other thing that's changed is the ability to run small, focused experiments on slices of the database. Instead of trying to overhaul the entire driver layer in one massive effort, I can take seven drivers, apply a specific set of quality rules to just those seven, check whether the improvements hold up, and learn from the results before touching the rest.

That's an iterative approach that works well in software development and design but is very difficult to apply to a taxonomy or ontology when you're working alone with spreadsheets. The overhead of setting up each experiment, running the checks, and validating the results was so high that I tended to batch everything into large, infrequent passes. Which meant errors accumulated between passes and each fix generated new problems I wouldn't catch until the next big review.

Now I can work in small slices. Seven drivers at a time. One lever family at a time. A specific set of route connections. Test the rules. See what breaks. Fix it. Move on. Come back and check the neighbouring slice.

I want to be clear about what hasn't changed. The domain knowledge is still mine. The behavioural rules are mine. The judgement about what constitutes a valid construct, a plausible mechanism, or a defensible route is mine. When the AI flags an inconsistency, I'm the one who decides whether it's a genuine problem or an acceptable edge case. When a construct needs redefining, I write the new definition based on what I know about how practitioners use it.

The AI is not a co-designer. It's more like a very thorough, very patient quality assurance process that I can direct and redirect as I learn.

What would have taken me another full year of solo work, I can now move through in focused stretches. The system that was too complex for one person to audit reliably is now auditable. The experiments that were too expensive to run frequently are now cheap enough to run weekly.

I'm still a team of one. But the constraint has shifted from "I can't hold this much complexity" to "I need to decide what to check next." That's a much better constraint to have.

Go deeper into the Building BehaviourKit series: