The Model Isn't the Product. The Harness Is.
Most people still talk about coding agents as if the whole story is the model.
Which model are you using? How long is the context window? How good is it at editing code? Can it use tools?
Those questions matter. But after spending the last day auditing a publicly posted source snapshot that claims to be Claude Code, plus re-reading OpenAI's essays on harness engineering and the Codex harness, I think they're the wrong center of gravity.
The model is not the product.
The harness is the product.
The Brain Is the Easy Part to Notice
A coding agent is easy to imagine if you stop at the model layer.
You have:
- a prompt
- some tools
- a context window
- a model that can reason
That gets you surprisingly far. It can read files. It can propose edits. It can run tests. It can explain what it's doing. If you've only used an agent for a few quick tasks, that feels like the whole thing.
It isn't.
Because the moment you ask the agent to do anything that lasts longer than a toy session, all the boring questions crash through the wall:
- What happens when the context gets too big?
- What happens when the websocket drops halfway through a tool call?
- What happens when the user has to approve something from another client surface?
- What happens when a child worker is still running after the parent session dies?
- What happens when the model streams a partial answer, then falls back, and now the transcript contains invalid intermediate state?
- What happens when the session disappears and comes back three hours later?
That is where products are made or broken.
Not on the benchmark page. In the harness.
What I Mean by "Harness"
OpenAI's wording is useful here.
In the harness engineering essay, the emphasis is not "we found the perfect model." The emphasis is that the engineering work moved outward:
- design the environment
- specify intent
- build feedback loops
- give the agent the right visibility into the system
And in the Codex harness essay, "the harness" becomes even more concrete. It is the runtime interface around Codex:
- threads
- turns
- items
- approvals
- event notifications
- reconnectable sessions
- protocol boundaries between clients and the core runtime
That matches what I found in the Claude Code snapshot almost exactly.
Not in naming. In shape.
The Snapshot's Most Important Lesson
The strongest thing in the repo is not a fancy tool.
It is not a hidden prompt.
It is not some clever magic buried in one function.
It is the amount of engineering dedicated to keeping long-running agent work alive.
The codebase has entire subsystems for:
- context compaction
- prompt-cache preservation
- sidechain transcripts
- background observer agents
- permission mediation
- worker identity
- task state
- replay after reconnect
- environment re-registration
- websocket deduplication
- token refresh and work-item heartbeat
That is not "extra plumbing."
That is the product.
The Query Loop Alone Isn't Enough
The main query loop in the snapshot is already more sophisticated than most people expect. It is not "send one request, get one answer." It's a state machine that keeps mutating its own state, running tools, compacting context, withholding recoverable errors, retrying, and following up.
But even that is still just the inner engine.
Around it, the system has to answer questions like:
- How do tools ask for permission?
- How do permissions propagate across subagents and remote clients?
- How do you keep transcripts valid if streamed output becomes stale?
- How do you let workers fork without destroying prompt-cache identity?
- How do you reconstruct a session after a reconnect?
The answer in this repo is not "hope the model figures it out."
The answer is a lot of explicit runtime machinery.
This Is Why Most Agent Demos Don't Generalize
A lot of agent demos are effectively:
- Give model a tool list.
- Let it call tools.
- Print the result.
That demo can be genuinely impressive.
It can also collapse the second you introduce:
- more than one agent
- more than one client
- real file edits
- long sessions
- partial failures
- approvals
- network instability
The coding agent snapshot I audited looks like it was built by people who had already stepped on all of those rakes.
The structure of the code tells the story:
- transcripts are written not for analytics, but for correctness
- duplicate control responses are explicitly suppressed
- reconnect logic carries sequence numbers across transport swaps
- compaction preserves not just summary text, but task state, plan attachments, skill state, and tool deltas
- background agents continuously maintain memory and summarize worker progress
That's not the architecture of a chatbot. That's the architecture of an operating environment.
The Real Product Surface
Once you see the harness clearly, the product starts to look different.
The surface is no longer just "chat with an AI."
It becomes:
- an event protocol
- an execution environment
- a permission system
- a memory system
- a worker topology
- a transport layer
- a recovery model
Which means the core design questions also change.
Instead of asking:
"How smart is the model?"
You start asking:
"What environment makes the model useful for six hours instead of six minutes?"
That is a much better question.
Why This Changes How I Think About Building Agents
The biggest shift for me after reading the snapshot was this:
I stopped thinking of agent products as "AI apps with tools."
I started thinking of them as distributed systems with a model in the middle.
That sounds obvious in hindsight, but it has real consequences.
If you're building a serious coding agent, the hardest parts are probably not:
- adding another tool
- tweaking a prompt
- swapping one frontier model for another
The hardest parts are probably:
- preserving state across failure
- deciding which workers should share context and which should isolate
- making approvals work across surfaces
- stopping context growth from poisoning the session
- exposing enough system state that the agent can debug itself
That's harness work.
And it's the part people under-budget until the product gets real users.
The Deepest Takeaway
The model is the part that reasons.
The harness is the part that makes the reasoning usable.
One gives you intelligence. The other gives you continuity.
One can write code. The other can survive the conditions under which real software gets written.
That's the difference between a compelling demo and an actual product.
In Part 2, I'll zoom in on the core loop itself β because the second big lesson from this codebase is that a coding agent is not a prompt template with tools attached.
It's a stateful query loop with recovery logic everywhere.