I spent months building Trucey. Designing the personality calibration, the theory-grounded advice generation, the role-based rehearsal simulation where an AI would play your difficult boss back at you with enough resistance to actually make you sweat a little. This wasn’t a chatbot I threw together. Every design decision was grounded in actual negotiation theory — Brett and Thompson’s framework for how power asymmetry shapes these conversations, Bradley and Campbell’s model of how difficult workplace interactions unfold across phases, and what the literature tells us about why people fail to negotiate even when they know exactly what they should do. Fear. Perceived risk. The weight of power imbalance. I wanted to build something that addressed all of that.
I believed in it. I built it because I genuinely thought that if you could rehearse a hard conversation — asking for a raise, pushing back on a decision, negotiating time off with someone who already had power over you — in a space that felt somewhat real, grounded in theory about how these conversations go wrong and why, you would walk into the actual conversation less afraid.
And then we ran the results.
Trucey did something well — it reduced fear better than anything else we tested. But when we looked at whether people felt empowered; whether they felt genuinely capable and confident and in control of their own preparation? A static, theory-grounded handbook outperformed both AI conditions. Not slightly. Significantly.
And then a PDF beat it.
My first reaction was some version of huh. what. why.
My second reaction, somewhere underneath that, was relief.
Here’s something I don’t think gets said enough about running experiments — you’re not supposed to be rooting for your intervention. I know that sounds strange. You built the thing, you care about it, you want it to work. But science that only confirms what you already believed isn’t really science. It’s expensive validation.
Karl Popper had a lot to say about this. His argument against inductive reasoning — the idea that confirming instances of something eventually prove a theory — was essentially that no amount of white swans rules out the existence of a black one. Real science, he argued, works deductively. You start with a bold hypothesis and you try as hard as you possibly can to prove it wrong. If it survives genuine attempts to falsify it, that’s what gives it credibility; not confirmation. Survival of rigorous challenge.
The results that make you stop and sit with discomfort for a while — those are the ones worth something. Those are the ones that actually move things somewhere.
The handbook winning wasn’t a failure of Trucey. It was the experiment working exactly as it should.
But it did leave me with a question I couldn’t shake. Why would a document — something you could stumble across on a blog, or find linked in a newsletter — outperform a carefully designed, interactive, personalized, theoretically grounded AI coach on the metrics that matter most for actually feeling ready? And not just any AI coach. One that was explicitly designed to address what we already know AI gets wrong; the verbosity, the cognitive overload, the lack of contextual grounding. Shorter responses. Better readability. Scaffolded learning. Theory shaping not just the content but the delivery itself. We did the work. And the document still won.
So I went and sat with people.
I want to say something honest about qualitative interviews that no methods textbook quite prepares you for. When you read about semi-structured interviews, everything is about protocol. Don’t lead the participant. Let them guide you. Have a loose direction but hold it lightly. All of that is true and none of it is the whole thing.
What the textbooks don’t tell you is that the structure is actually the least important part. What matters is what you do when someone hands you an opening mid-conversation; whether you’re present enough to catch it. Whether you can file something away that someone said in minute four and understand it completely differently by minute forty because of everything that came in between. That’s a skill the protocol can gesture at but can’t teach you. You learn it by sitting across from a real person and realizing the guide is just the beginning.
And walking into those conversations already knowing the quantitative results had surprised me — that sharpened my listening in a way I hadn’t anticipated. I wasn’t leading people anywhere. But I knew what I was listening for. There’s a difference; and it made the interviews richer not poorer.
I realize there’s a certain irony in invoking Popper here. The quantitative side of this study is as close to Popperian as I could make it — a pre-registered experiment, genuine controls, real attempts to let the hypothesis fail. But the qualitative side? Braun and Clarke’s reflexive thematic analysis. Explicitly inductive. Let the patterns emerge, follow where the data leads, don’t go in with a predetermined answer. Philosophically almost the opposite.
And yet even that wasn’t purely inductive. I walked into those interviews with a quantitative surprise already sitting in the back of my mind. I wasn’t testing a hypothesis but I wasn’t blank either. Somewhere between deduction and induction lives abduction — inference to the best explanation, the most human way to reason. You have something unexpected and you go looking for the most plausible account of why. That’s what the interviews were really doing.
Mixed methods is sometimes sold as just using two types of data. I think it’s actually about holding two different ways of knowing simultaneously and letting them talk to each other.
What people told me, in their own words, made everything click.
When you’re already anxious about a hard conversation — when you’re managing the fear of saying the wrong thing to someone who has real power over you — being guided through a preparation tool at the system’s pace, in the system’s sequence, adds a whole new cognitive job on top of the emotional one you’re already carrying. The AI, however thoughtfully designed, was still making decisions about your preparation for you. It decided what came next. It decided the pacing. It decided when you’d moved on from one thing to the next.
The handbook just sat there and let you be in charge.
You could skim. You could jump to the section you needed. You could read the same paragraph four times. You could close it and come back when you were ready. The information was yours to navigate; not something being delivered to you on someone else’s terms.
When you’re scared, that difference is everything.
And here’s the thing I keep thinking about now, beyond this study, beyond Trucey.
We are in a moment where AI is being layered onto everything. Every app, every workflow, every professional development tool is getting a conversational interface dropped onto it with the implicit assumption that more interactive, more personalized, more guided automatically means better. That assumption is almost never questioned out loud.
But what this study quietly suggests is that new interaction paradigms don’t simply inherit the benefits of old ones and add more. They make tradeoffs; real ones. And we’re not always honest — or even aware — of what gets lost in those tradeoffs. The question worth asking isn’t just does the AI work? It’s does this interaction model actually fit what the person needs in this moment?
Because a person preparing for a scary conversation with their boss is not in the same psychological state as someone browsing for information. They’re already cognitively loaded; already emotionally stretched. Adding a system that requires them to converse, respond, follow along, keep up — that’s not always a gift. Sometimes it’s just more weight.
We keep asking whether AI is better than what came before. I think the more honest question is — better at what, for whom, and at what cost to the things we stopped paying attention to when we got excited about the new thing?
The handbook won not despite its simplicity but because of it.
I built something sophisticated and learned something humbling. That feels about right for a first experiment. And honestly, I wouldn’t have it any other way.
This post is based on my research on AI coaching for workplace negotiations. If you want the full study, the tables, the regression models - you can find the preprint here. And if you want to talk about it, reach out.