The Model Isn't the Product. The Harness Is.

Most people still talk about coding agents as if the whole story is the model.

Which model are you using? How long is the context window? How good is it at editing code? Can it use tools?

Those questions matter. But after spending the last day auditing a publicly posted source snapshot that claims to be Claude Code, plus re-reading OpenAI's essays on harness engineering and the Codex harness, I think they're the wrong center of gravity.

The model is not the product.

The harness is the product.

The Brain Is the Easy Part to Notice

A coding agent is easy to imagine if you stop at the model layer.

You have:

a prompt
some tools
a context window
a model that can reason

That gets you surprisingly far. It can read files. It can propose edits. It can run tests. It can explain what it's doing. If you've only used an agent for a few quick tasks, that feels like the whole thing.

It isn't.

Because the moment you ask the agent to do anything that lasts longer than a toy session, all the boring questions crash through the wall:

What happens when the context gets too big?
What happens when the websocket drops halfway through a tool call?
What happens when the user has to approve something from another client surface?
What happens when a child worker is still running after the parent session dies?
What happens when the model streams a partial answer, then falls back, and now the transcript contains invalid intermediate state?
What happens when the session disappears and comes back three hours later?

That is where products are made or broken.

Not on the benchmark page. In the harness.

What I Mean by "Harness"

OpenAI's wording is useful here.

In the harness engineering essay, the emphasis is not "we found the perfect model." The emphasis is that the engineering work moved outward:

design the environment
specify intent
build feedback loops
give the agent the right visibility into the system

And in the Codex harness essay, "the harness" becomes even more concrete. It is the runtime interface around Codex:

threads
turns
items
approvals
event notifications
reconnectable sessions
protocol boundaries between clients and the core runtime

That matches what I found in the Claude Code snapshot almost exactly.

Not in naming. In shape.

The Snapshot's Most Important Lesson

The strongest thing in the repo is not a fancy tool.

It is not a hidden prompt.

It is not some clever magic buried in one function.

It is the amount of engineering dedicated to keeping long-running agent work alive.

The codebase has entire subsystems for:

context compaction
prompt-cache preservation
sidechain transcripts
background observer agents
permission mediation
worker identity
task state
replay after reconnect
environment re-registration
websocket deduplication
token refresh and work-item heartbeat

That is not "extra plumbing."

That is the product.

The Query Loop Alone Isn't Enough

The main query loop in the snapshot is already more sophisticated than most people expect. It is not "send one request, get one answer." It's a state machine that keeps mutating its own state, running tools, compacting context, withholding recoverable errors, retrying, and following up.

But even that is still just the inner engine.

Around it, the system has to answer questions like:

How do tools ask for permission?
How do permissions propagate across subagents and remote clients?
How do you keep transcripts valid if streamed output becomes stale?
How do you let workers fork without destroying prompt-cache identity?
How do you reconstruct a session after a reconnect?

The answer in this repo is not "hope the model figures it out."

The answer is a lot of explicit runtime machinery.

This Is Why Most Agent Demos Don't Generalize

A lot of agent demos are effectively:

Give model a tool list.
Let it call tools.
Print the result.

That demo can be genuinely impressive.

It can also collapse the second you introduce:

more than one agent
more than one client
real file edits
long sessions
partial failures
approvals
network instability

The coding agent snapshot I audited looks like it was built by people who had already stepped on all of those rakes.

The structure of the code tells the story:

transcripts are written not for analytics, but for correctness
duplicate control responses are explicitly suppressed
reconnect logic carries sequence numbers across transport swaps
compaction preserves not just summary text, but task state, plan attachments, skill state, and tool deltas
background agents continuously maintain memory and summarize worker progress

That's not the architecture of a chatbot. That's the architecture of an operating environment.

The Real Product Surface

Once you see the harness clearly, the product starts to look different.

The surface is no longer just "chat with an AI."

It becomes:

an event protocol
an execution environment
a permission system
a memory system
a worker topology
a transport layer
a recovery model

Which means the core design questions also change.

Instead of asking:

"How smart is the model?"

You start asking:

"What environment makes the model useful for six hours instead of six minutes?"

That is a much better question.

Why This Changes How I Think About Building Agents

The biggest shift for me after reading the snapshot was this:

I stopped thinking of agent products as "AI apps with tools."

I started thinking of them as distributed systems with a model in the middle.

That sounds obvious in hindsight, but it has real consequences.

If you're building a serious coding agent, the hardest parts are probably not:

adding another tool
tweaking a prompt
swapping one frontier model for another

The hardest parts are probably:

preserving state across failure
deciding which workers should share context and which should isolate
making approvals work across surfaces
stopping context growth from poisoning the session
exposing enough system state that the agent can debug itself

That's harness work.

And it's the part people under-budget until the product gets real users.

The Deepest Takeaway

The model is the part that reasons.

The harness is the part that makes the reasoning usable.

One gives you intelligence. The other gives you continuity.

One can write code. The other can survive the conditions under which real software gets written.

That's the difference between a compelling demo and an actual product.

In Part 2, I'll zoom in on the core loop itself — because the second big lesson from this codebase is that a coding agent is not a prompt template with tools attached.

It's a stateful query loop with recovery logic everywhere.