A Coding Agent Is Not a Prompt. It's a Stateful Query Loop.

In Part 1, I argued that the model isn't the product.

The harness is.

Now I want to zoom in one level and make a second claim:

A coding agent is not a prompt.

It's not even a prompt plus tools.

At the core, it's a query loop.

And if that loop isn't stateful, recovery-aware, and built for iteration, the rest of the system becomes fragile no matter how smart the model is.

The Popular Mental Model Is Too Simple

The common mental model for agents still looks like this:

Build a big system prompt
Give the model a tool list
Let it think
Execute tool calls
Return an answer

That model is fine for explaining the idea to someone new.

It is terrible for explaining how real coding agents actually work.

Because it smuggles in a hidden assumption: that each user request is mostly one pass through the model.

In practice, a serious coding agent does something more like this:

Enter loop state
Prepare context
Shrink or rewrite context if needed
Call model
Stream partial outputs
Execute tools
Update internal state
Decide whether follow-up is needed
Retry, compact, or recover if something went wrong
Continue until a terminal condition is reached

That's not a prompt flow.

That's a state machine.

The Most Revealing File in the Snapshot

The file that made this clearest to me in the Claude Code snapshot was query.ts.

Not because it had one magical trick.

Because of what it assumed about the world.

It assumed:

the session might need multiple internal turns
tool results might need budgeting before they even enter the next call
context might need to be snipped, micro-compacted, collapsed, or fully compacted
streaming might fall back mid-flight
prompt-too-long errors might be recoverable
max-output-token failures might need dedicated recovery logic
the turn might continue for reasons that should be tracked explicitly

That is the mindset of engineers who do not trust the happy path.

Which is exactly the mindset you want in an agent runtime.

Why the Loop Has to Carry State

The snapshot's query loop carries mutable state across iterations:

messages
compaction tracking
output-token recovery count
pending tool summaries
stop-hook state
turn count
transition reason

That matters because agent behavior is path-dependent.

What the model should do next depends on:

what tools were just called
what results were already replaced or summarized
whether compaction already happened
whether a fallback path already fired
whether a recovery attempt already failed

You can't express that well if every step pretends it's a clean, stateless call.

Stateless APIs are elegant.

Long-running coding work is not.

Recovery Logic Is Not an Edge Case

One of the strongest patterns in the codebase is that recoverability is built into the main loop, not bolted on the side.

That shows up in a few ways.

1. Errors are sometimes withheld on purpose

If the model returns something recoverable — a prompt-too-long condition, a media-size issue, a max-output-tokens issue — the system doesn't necessarily surface it immediately.

Instead, it may hold the error back long enough for a recovery subsystem to try:

context collapse
reactive compaction
truncation retry

That's a surprisingly deep design decision.

It means the runtime is optimizing for "keep the agent alive" rather than "surface every raw error as soon as possible."

2. Partial streamed messages may be tombstoned

If a streaming path falls back and leaves behind orphaned partial messages, the runtime explicitly emits tombstones to invalidate them.

Why?

Because a half-streamed assistant message is not harmless.

It can poison the transcript. It can create invalid tool-use structure. It can break later API calls if the system naively treats it as durable history.

That is exactly the sort of subtle failure mode you only learn by operating a real system.

3. Blocking is conditional, not absolute

The loop does not simply say "context too large, fail."

It checks:

did compaction already happen?
is reactive recovery enabled?
does context collapse own this overflow case?
is this a compact/session-memory query that would deadlock if blocked here?

That kind of conditionality is ugly if you want elegance.

It's excellent if you want the product to survive.

Tool Execution Is Part of the Loop, Not a Side Effect

Another thing the snapshot makes obvious: tool execution is not outside the loop.

It is the loop.

The publicly posted source snapshot reveals a tool execution pipeline far more involved than most developers expect. The system doesn't just hand tool calls to a helper and wait. It:

validates schema
runs tool-specific input validation
starts speculative classifiers (classifiers that predict permission outcomes before the tool actually runs)
runs pre-tool hooks
resolves permission decisions
executes the tool
maps tool results
runs post-tool hooks
possibly modifies future context through a context modifier

That means the next model turn is not simply "previous messages plus new tool result."

It's "previous messages plus new tool result, after passing through a policy and transformation pipeline."

Again: state machine, not prompt wrapper.

Why This Matters for Building Your Own Agents

If you're designing your own coding agent, this is one of the most important perspective shifts you can make.

Do not ask:

"What should my prompt say?"

Start by asking:

"What state do I need to carry between internal turns?"

That question forces better design.

Because once you answer it honestly, you notice all the things your runtime has to remember:

which tools are still in progress
which errors were already retried
what parts of history have been summarized
what can still be trusted in the transcript
which worker or sidechain owns what state

Those are runtime questions.

And they drive the product more than prompt cleverness does.

The Model's Job Shrinks in a Good Way

Once the loop becomes more capable, the model's job actually becomes narrower.

That's a good thing.

The runtime takes over the parts that should be mechanical:

error recovery
transcript repair
approval mediation
context maintenance
event ordering

And the model gets to focus on what it's actually good at:

choosing actions
synthesizing information
generating code
planning next steps

This is the same thing good software architecture does for humans.

You don't want your engineers debugging the scheduler every time they make a product change.

You want infrastructure to absorb infrastructure problems.

Same story here.

The Best Analogy I Found

The best analogy I found while reading this code was not "chatbot with tools."

It was "event loop with a language model inside it."

That sounds weird at first, but it fits.

The runtime continuously:

reads state
processes events
dispatches work
mutates internal state
handles exceptional paths
decides whether to continue

That is much closer to an event loop than to a one-shot question-answer interaction.

And once you see it that way, a lot of other decisions make sense:

why transcripts matter so much
why compaction is layered
why tool calls are centralized
why recovery logic lives in the heart of the loop

The Most Actionable Lesson

If I had to reduce this whole post to one practical lesson, it would be this:

When your agent starts to feel unreliable, don't only look at the prompt.

Look at the loop.

Ask:

What state isn't being carried forward correctly?
What failures aren't classified well enough for recovery?
What partial outputs are polluting the transcript?
What retry logic is missing?
What context transformation should happen before the next model call?

Most real agent bugs are runtime bugs wearing prompt-shaped masks.

One Step Further

And once you accept that a coding agent is really a stateful query loop, you run into the next uncomfortable truth:

performance and cost are not second-order details.

They're structural.

Which brings me to the strangest and most illuminating part of the Claude Code snapshot:

forked workers built around fake tool results so multiple children can share an identical cache prefix.

That sounds bizarre.

It's also a sign that prompt cache, in serious agent systems, is not an optimization.

It's architecture.

That's Part 3.