AI Agent DesignDeveloper ExperienceMongoDB Atlas

Building the skill that teaches agents to teach search

Designed and built the rule-governed AI agent skill that powers MongoDB Atlas's conversational search quickstart, from noticing the gap to shipping it.

RoleProduct Designer + Builder
Project typeDeveloper quickstart
PlatformClaude / MCP Agent

Origin

I found the gap by being the user

I spend a lot of time in developer tools. Not as a developer exactly, but close enough to feel the friction. I was poking around MongoDB Atlas Search, genuinely trying to understand what it could do, and I noticed something: the documentation was thorough, but it didn't respond to me. It couldn't tell when I was lost. It had no idea I was on a free tier and was routing myself toward instructions that wouldn't work for my setup.

We had skills. We had MCP integrations. What we didn't have was a proper getting-started experience that met developers where they actually are — confused, impatient, and usually two environments away from the example in the docs. That felt like a real gap, not just a nice-to-have.

So I picked it up. No one asked me to. I just thought: this should exist, and I can build it.

"The docs can walk someone through the API. They can't say 'you're on a free tier — here's your path instead' in real time."

Approach

The decision tree is the design

The first thing I did was resist the urge to write a prompt. A single prompt that tries to handle every developer's situation gets flabby fast. What I needed was structure: a branching ruleset that governs the agent's behavior at every fork, while leaving room for it to be genuinely conversational in between.

The decision tree below isn't implementation detail. It was the primary design artifact. I drafted it before writing any skill code, walked through it against real developer scenarios, and rebuilt it several times before the routing logic felt right.

StartMCP connected?Step 0 — setup checkNoRun setupskill firstYessample_mflix loaded?Step 1 — dataset checkNoPromptuser to loadYesWhat kind of search?Step 2 — branch pointSemanticKeywordM10+ cluster?Step 3a — cluster tierYesCreate autoEmbed indexStep 4a — wait for READYNoRun semantic queryStep 5aTry another query?Step 6a — iterateAdd genre filter?Step 7a — pre-filterCreate text indexStep 3b — wait for READYRun keyword queryStep 4bTry fuzzy matching?Step 5b — typo toleranceTry autocomplete?Step 6b — prefix searchTry hybrid search?Step 7b — convergeHybrid: $rankFusionSemantic 70% + keyword 30%Compare live resultsFlip weights, see differenceWrap-upExport Python scriptPath legendSemantic pathKeyword pathDecision / hybridSetup / endAll paths converge at hybrid search

The thing I kept coming back to: every path had to converge at hybrid search. Regardless of what a developer came in wanting to do, they should leave having seen $rankFusion run on real data. That wasn't just a nice outcome. It was a design constraint I built the whole tree around.

Rule System

What the agent always does, adapts, and never does

One of the principles I set for myself when building this: the skill should have an explicit behavioral contract. Not vibes. Actual rules. If a new scenario came up, I wanted a framework to evaluate it against, not a prompt I'd have to rewrite from scratch.

AlwaysCheck before proceeding

MCP connection and dataset availability verified silently before anything else. No one fails mid-tutorial because we skipped preconditions.

AlwaysExplain the why

Before each operation, the agent explains what it's doing and why. Developers leave understanding the concepts, not just having followed instructions.

AlwaysWait for READY

Index creation takes time. The agent holds until active. Running a query against a building index is a silent failure most tutorials don't catch.

ConditionallyAdapt cluster path

Routes to auto-embed or manual embed based on actual cluster tier. This was the original insight: the docs assumed M10+. Most people aren't on M10+.

ConditionallyInvite deeper exploration

Offers iteration, genre filters, weight-flipping — but only after the core experience has landed. You earn the advanced stuff.

NeverDead-end the session

If something fails, there's always an alternative path. The agent doesn't leave someone staring at an error message with nowhere to go.

Process

How it actually got built

I want to be honest about what this process looked like, because it wasn't a tidy waterfall. It was: have an idea, build a rough version, run it against a bunch of scenarios, find the problems, fix them, repeat. The design and the building happened at the same time, by the same person.

The core sequence was:

01Audit the existing docs

Mapped every friction point in the Atlas Search onboarding. Found where and why people were dropping out. The cluster tier problem jumped out immediately.

02Draft the decision tree

Before writing any code, drafted the full routing logic — all paths, all branches, all fallbacks. This document was shared for alignment before implementation.

03Write the rule contract

Specified the behavioral rules explicitly. Always/conditionally/never. This made testing tractable because I knew what to test against.

04Scenario testing

Built a structured scenario set covering infrastructure variety, intent variety, and edge cases. Ran every version of the skill through it. Found real bugs.

05Design the comparison moment

The weight-flipping comparison at hybrid search was treated as a product moment, not a feature. Designed the agent's framing to make it land.

06Script export as closing artifact

Added a Python script at wrap-up so developers leave with something immediately usable. The tutorial becomes a starting point, not an endpoint.

One thing that shaped the whole approach: I kept asking "what does a developer walk away knowing?" Not what they did, but what they understand. That question changed how I wrote the agent's explanations at each step.

Testing

Scenarios that stress-tested the logic

Testing an AI agent skill is a bit like QA-ing a conversation. You can't cover every possible exchange, but you can design for the most likely failure modes. Here's a selection from the scenario set:

ScenarioConditionResult
Free-tier user wants semantic searchNo M10+ clusterRerouted to manual embed path correctly
User skips every optional stepDeclines fuzzy, autocomplete, genre filterArrives at hybrid cleanly, no dead end
Dataset not loadedsample_mflix absentHolds, explains, waits — doesn't proceed
User types a custom semantic queryFreeform input mid-flowRuns it, explains results, continues
User wants keyword onlyDeclines hybrid at Step 7bOffers Python script, wraps gracefully
Index creation stallsAtlas index stuck in building stateBug found, fixed: early version continued anyway
User flips hybrid weightsRequests keyword-heavy queryRe-runs 70/30 swap, shows side-by-side results

The stalled index bug was the most important catch. An agent that quietly runs queries against a building index gives wrong results and the developer has no idea why. That's worse than failing loudly.

Outcomes

Where things stand

3Search paradigms covered in one session
30+Scenarios tested across iterations
100%Of paths arrive at hybrid search

The skill is complete and currently moving through the publishing approval process. Every developer who goes through it arrives at a live hybrid search comparison, a working understanding of $rankFusion, and a runnable Python script to take with them.

The decision tree pattern itself became reusable. It's now a template for structuring future quickstart skills — which means this project produced a methodology, not just a single deliverable.

Reflection

What I actually learned

The biggest shift was realizing that designing for AI agents is fundamentally a systems design problem, not a copywriting problem. Writing better prompts got me maybe 20% of the way there. The other 80% was the tree, the rules, and the testing protocol.

Structure beats cleverness

A well-structured decision tree with boring, explicit rules outperforms a clever, freeform prompt every time. The agent's behavior became predictable and testable. That made iteration possible instead of chaotic.

Test like you mean it

I found a real bug — the stalled index issue — through scenario testing that I would never have caught in normal use. Building a test matrix before the skill was done changed what I found and when.

The "why" is the hardest part

Getting the agent to run queries is easy. Getting it to explain each step in a way that builds actual understanding, without sounding like a manual, took far more iteration than anything else.

Speed requires a clear constraint

I moved fast on this because I had one clear question anchoring every decision: "What does a developer walk away knowing?" When I got stuck, I'd go back to that. It cut a lot of noise.

AI agent designDecision tree architectureDeveloper experienceConversational UXMCP integrationVector searchHybrid searchScenario-based testingSkill specification