Evidence and the kind its needs

Why proving a system works requires different evidence from proving its parts work.

BUILDING BEHAVIOURKIT

Lauren Kelly

11/17/2023

I've been working on the evidence layer between the drivers and the patterns. Not gathering new patterns or identifying new drivers. Working on the connections. The logic that says: when this particular driver is the barrier, this particular pattern is the right response.

This turns out to be a different kind of evidence problem from the one I solved during the cataloguing.

When I built the pattern library, the evidence question was: does this intervention shape work? Can I find real cases where it was tried and produced results? That's a content claim. Does this pattern exist in the wild and does it do what it says? The answer, for most of the fifty-two patterns, is yes. I catalogued them from real cases. They have provenance.

The question I'm working on now is more specific: when this driver is in this state, does routing to this lever and this pattern actually produce better outcomes than routing somewhere else? That's a routing claim. And routing claims need routing evidence.

Some connections have it. "Cue Visibility is low, so make the signal more visible" is well-supported in the academic literature. There are studies showing that increasing the salience of a cue at the point of action increases the likelihood of the desired behaviour. The mechanism is clear, the evidence is strong, and the connection between the driver and the intervention is direct.

Others are more like educated reasoning. "Competing Priorities is the barrier, so help people follow through with commitment devices and scheduling support." That makes theoretical sense. I've seen it work in practice. I could construct a plausible mechanism chain from the literature. But I can't point to a single study that tested that exact connection and measured the difference.

The distinction matters because I want the system to be honest about what it knows. If a connection has strong evidence, the system should say so. If the connection is plausible but untested, the system should say that too. Different confidence levels for different connections. Not a binary of "proven" and "unproven," but a spectrum: strong, reasonable, and theoretical.

I'm finding this distinction particularly important as the wider field matures. More practitioners are asking for evidence-based tools rather than evidence-inspired ones. The question "show me the study" comes up in client conversations more often than it used to, and it comes from project leads and commissioning managers, not just academics. That's a healthy development. It means the market is getting more discerning.

It also raises the bar considerably for what I'm building. I don't want BehaviourKit to be the kind of product that says "evidence-based" on the label and then delivers a mixture of strong findings, reasonable inferences, and educated guesses without distinguishing between them. The science community has a word for that, and it isn't flattering.

So I'm building a layer of evidence metadata into the system. For each connection between a driver and a pattern, I'm recording: what evidence exists, what kind of evidence it is (academic study, field trial, practitioner observation, theoretical inference), how strong the mechanism match is, and what the known boundary conditions are (when does this connection hold, and when does it break down?).

This is slow work. Each connection needs to be assessed individually. And the honest answer for some connections is "we don't have direct evidence for this specific route yet." Which is uncomfortable but important to acknowledge.

The evidence question has also made me think differently about what kind of product BehaviourKit needs to be. A toolkit can get away with being generally supported by the literature. A recommendation engine cannot. If the system is going to tell someone "try this pattern for this driver," it needs to know how confident it should be in that recommendation. And it needs to communicate that confidence to the user.

Evidence, I'm learning, is not just about credibility. It's about honesty. The system needs to know what it knows, know what it doesn't, and be transparent about the difference.

Go deeper into the Building BehaviourKit series: