A Coding Agent Is Not a Prompt. It's a Stateful Query Loop.
In Part 1, I argued that the model isn't the product.
The harness is.
Now I want to zoom in one level and make a second claim:
A coding agent is not a prompt.
It's not even a prompt plus tools.
At the core, it's a query loop.
And if that loop isn't stateful, recovery-aware, and built for iteration, the rest of the system becomes fragile no matter how smart the model is.
The Popular Mental Model Is Too Simple
The common mental model for agents still looks like this:
- Build a big system prompt
- Give the model a tool list
- Let it think
- Execute tool calls
- Return an answer
That model is fine for explaining the idea to someone new.
It is terrible for explaining how real coding agents actually work.
Because it smuggles in a hidden assumption: that each user request is mostly one pass through the model.
In practice, a serious coding agent does something more like this:
- Enter loop state
- Prepare context
- Shrink or rewrite context if needed
- Call model
- Stream partial outputs
- Execute tools
- Update internal state
- Decide whether follow-up is needed
- Retry, compact, or recover if something went wrong
- Continue until a terminal condition is reached
That's not a prompt flow.
That's a state machine.
The Most Revealing File in the Snapshot
The file that made this clearest to me in the Claude Code snapshot was query.ts.
Not because it had one magical trick.
Because of what it assumed about the world.
It assumed:
- the session might need multiple internal turns
- tool results might need budgeting before they even enter the next call
- context might need to be snipped, micro-compacted, collapsed, or fully compacted
- streaming might fall back mid-flight
- prompt-too-long errors might be recoverable
- max-output-token failures might need dedicated recovery logic
- the turn might continue for reasons that should be tracked explicitly
That is the mindset of engineers who do not trust the happy path.
Which is exactly the mindset you want in an agent runtime.
Why the Loop Has to Carry State
The snapshot's query loop carries mutable state across iterations:
- messages
- compaction tracking
- output-token recovery count
- pending tool summaries
- stop-hook state
- turn count
- transition reason
That matters because agent behavior is path-dependent.
What the model should do next depends on:
- what tools were just called
- what results were already replaced or summarized
- whether compaction already happened
- whether a fallback path already fired
- whether a recovery attempt already failed
You can't express that well if every step pretends it's a clean, stateless call.
Stateless APIs are elegant.
Long-running coding work is not.
Recovery Logic Is Not an Edge Case
One of the strongest patterns in the codebase is that recoverability is built into the main loop, not bolted on the side.
That shows up in a few ways.
1. Errors are sometimes withheld on purpose
If the model returns something recoverable β a prompt-too-long condition, a media-size issue, a max-output-tokens issue β the system doesn't necessarily surface it immediately.
Instead, it may hold the error back long enough for a recovery subsystem to try:
- context collapse
- reactive compaction
- truncation retry
That's a surprisingly deep design decision.
It means the runtime is optimizing for "keep the agent alive" rather than "surface every raw error as soon as possible."
2. Partial streamed messages may be tombstoned
If a streaming path falls back and leaves behind orphaned partial messages, the runtime explicitly emits tombstones to invalidate them.
Why?
Because a half-streamed assistant message is not harmless.
It can poison the transcript. It can create invalid tool-use structure. It can break later API calls if the system naively treats it as durable history.
That is exactly the sort of subtle failure mode you only learn by operating a real system.
3. Blocking is conditional, not absolute
The loop does not simply say "context too large, fail."
It checks:
- did compaction already happen?
- is reactive recovery enabled?
- does context collapse own this overflow case?
- is this a compact/session-memory query that would deadlock if blocked here?
That kind of conditionality is ugly if you want elegance.
It's excellent if you want the product to survive.
Tool Execution Is Part of the Loop, Not a Side Effect
Another thing the snapshot makes obvious: tool execution is not outside the loop.
It is the loop.
The publicly posted source snapshot reveals a tool execution pipeline far more involved than most developers expect. The system doesn't just hand tool calls to a helper and wait. It:
- validates schema
- runs tool-specific input validation
- starts speculative classifiers (classifiers that predict permission outcomes before the tool actually runs)
- runs pre-tool hooks
- resolves permission decisions
- executes the tool
- maps tool results
- runs post-tool hooks
- possibly modifies future context through a context modifier
That means the next model turn is not simply "previous messages plus new tool result."
It's "previous messages plus new tool result, after passing through a policy and transformation pipeline."
Again: state machine, not prompt wrapper.
Why This Matters for Building Your Own Agents
If you're designing your own coding agent, this is one of the most important perspective shifts you can make.
Do not ask:
"What should my prompt say?"
Start by asking:
"What state do I need to carry between internal turns?"
That question forces better design.
Because once you answer it honestly, you notice all the things your runtime has to remember:
- which tools are still in progress
- which errors were already retried
- what parts of history have been summarized
- what can still be trusted in the transcript
- which worker or sidechain owns what state
Those are runtime questions.
And they drive the product more than prompt cleverness does.
The Model's Job Shrinks in a Good Way
Once the loop becomes more capable, the model's job actually becomes narrower.
That's a good thing.
The runtime takes over the parts that should be mechanical:
- error recovery
- transcript repair
- approval mediation
- context maintenance
- event ordering
And the model gets to focus on what it's actually good at:
- choosing actions
- synthesizing information
- generating code
- planning next steps
This is the same thing good software architecture does for humans.
You don't want your engineers debugging the scheduler every time they make a product change.
You want infrastructure to absorb infrastructure problems.
Same story here.
The Best Analogy I Found
The best analogy I found while reading this code was not "chatbot with tools."
It was "event loop with a language model inside it."
That sounds weird at first, but it fits.
The runtime continuously:
- reads state
- processes events
- dispatches work
- mutates internal state
- handles exceptional paths
- decides whether to continue
That is much closer to an event loop than to a one-shot question-answer interaction.
And once you see it that way, a lot of other decisions make sense:
- why transcripts matter so much
- why compaction is layered
- why tool calls are centralized
- why recovery logic lives in the heart of the loop
The Most Actionable Lesson
If I had to reduce this whole post to one practical lesson, it would be this:
When your agent starts to feel unreliable, don't only look at the prompt.
Look at the loop.
Ask:
- What state isn't being carried forward correctly?
- What failures aren't classified well enough for recovery?
- What partial outputs are polluting the transcript?
- What retry logic is missing?
- What context transformation should happen before the next model call?
Most real agent bugs are runtime bugs wearing prompt-shaped masks.
One Step Further
And once you accept that a coding agent is really a stateful query loop, you run into the next uncomfortable truth:
performance and cost are not second-order details.
They're structural.
Which brings me to the strangest and most illuminating part of the Claude Code snapshot:
forked workers built around fake tool results so multiple children can share an identical cache prefix.
That sounds bizarre.
It's also a sign that prompt cache, in serious agent systems, is not an optimization.
It's architecture.
That's Part 3.