What the researchers found

Five problems with one root cause, found before a single line of code was written.

BUILDING BEHAVIOURKIT

Lauren Kelly

9/5/2025

I ran user research on the Quick Start feature spec this month. Sessions with applied behavioural scientists, product designers, and service designers. People who work with behaviour change tools professionally, who understand both the science and the practical realities of using it in projects.

I wanted to know: does this spec hold up when a critical, experienced practitioner walks through it? Where does it wobble? Where does it mislead? Where does it promise something the experience doesn't deliver?

They found five problems. All of them were real, and all of them shared the same root cause.

Problem one: the system was putting unconfirmed text into the team brief. When you use Quick Start, the system generates a synthesis of your problem based on your inputs. That synthesis was appearing in the shareable team brief before the user had confirmed it was accurate. Which means the system was trusting its own interpretation more than it trusted the person using it. If the synthesis was slightly wrong, and it easily could be, the brief would carry that error into a meeting room or a Slack channel without the user ever having a chance to correct it.

Problem two: there was no graceful path for uncertainty. If the recommendation landed and it didn't feel right, the user had no way to say "I'm not sure about this." The system assumed its output was accepted. In reality, recommendations land with varying degrees of confidence. Sometimes you need to sit with something before committing. The spec had no room for that.

Problem three: internal labels were leaking into user-facing outputs. "Support from Others" and "Group Uptake" were appearing in team briefs. Those are database names. They're meaningful inside the system. They're meaningless to a creative director reading a brief, or a programme manager reviewing a plan. The translation layer, the thing that converts internal labels into plain language, wasn't running on external outputs.

Problem four: a gating decision was blocking the primary use case. One participant needed the recommendation view to get pre-pilot approval from a stakeholder. The spec required check-ins before unlocking certain views, which meant the thing she needed to share for approval was locked behind a step she hadn't completed yet. The gate was in the wrong place.

Problem five: the most important sentence on the page was styled in a way that undermined its authority. The synthesis statement, the system's confirmed interpretation of the user's problem, was displayed in italics. Italics read as provisional, tentative, uncertain. The synthesis should feel like a settled starting point, something the user has confirmed and the system is presenting with confidence. The visual hierarchy was working against the content hierarchy.

Five problems. Same root cause in every case: something that was carefully designed in the spec was losing fidelity on its way to the user's experience.

This is a pattern I've seen before in service design work. The gap between intention and implementation. The designer meant one thing, the spec captured it, and then somewhere between spec and experience, the meaning drifted. Internal logic leaked through. Edge cases weren't handled. Visual signals contradicted the content.

The fix for each individual problem was straightforward. Confirm before publishing. Add an uncertainty path. Run the translation layer on all outputs. Move the gate. Change the typography. Small changes, each of them.

But the larger lesson is the one I'm taking forward. Test your spec as seriously as you'd test your product. The spec is the first version of the product. The problems the researchers found were spec problems, caught before a single line of code was written. That's the right time to find them. Once the spec has bugs, the product will inherit them unless someone intervenes.

I'm grateful to the people who gave their time and their critical eye to this. The findings were uncomfortable and useful in exactly the right proportions.

Go deeper into the Building BehaviourKit series: