ENZH

From Vision to Forge: Why I'm Building an Autonomous Dev Team

The Observation Became a Question

In Part 10 of Agentic AI Thoughts, I wrote that the loop had closed. Code was evolving itself. Peter Steinberger built OpenClaw in an hour β€” three months later, 175,000 stars, no team, the software rewriting itself through prompt requests from people who'd never written code.

I believed every word. I still do.

But after I published that piece, a question started nagging me. If the loop has truly closed β€” if agents can read their own source, understand their architecture, submit their own PRs β€” then why does my daily workflow still look like this:

  1. Open Claude Code
  2. Describe a task
  3. Watch it work for 20 minutes
  4. Review the PR
  5. Merge
  6. Open another terminal
  7. Describe the next task
  8. Repeat

I am the orchestrator. I am the dispatcher, the reviewer, the dependency resolver, the context carrier between tasks. The model is brilliant. The harness is powerful. But the loop isn't closed at all β€” I'm the loop. Every time I copy-paste context from one terminal to another, every time I decide "task B can't start until task A merges," every time I glance at my Linear board and mentally map which issue maps to which terminal session β€” that's me doing work that a system should do.

The model can code. But nobody has built the foreman.

What Exists Today

I spent a week studying every open-source tool that claims to solve this.

Composio Agent Orchestrator (5,650 stars, MIT). The most serious attempt. A full plugin architecture β€” 7 typed interfaces for runtime, agent, workspace, tracker, SCM, notifier, terminal. Session lifecycle with 16 states. Reaction engine that auto-responds to CI failures. Real engineering.

But when you look under the hood: tmux-based, single-machine, polling architecture. State lives in flat files. There's a setInterval loop checking whether agents are still alive. No durable execution β€” if the process crashes mid-task, the session is gone. No Linear Agent Sessions. No webhook-driven event flow. Good ideas, fragile bones.

AgentsMesh (1,200 stars, BSL). Go-based, Kanban-to-pod binding, real-time topology visualization. The architectural instinct is right: ticket content becomes the agent's prompt, MR linked to ticket and pod. But BSL-licensed, no Feishu integration, no Temporal, and fundamentally oriented toward a different deployment model.

dmux (1,300 stars, MIT). A local terminal multiplexer for AI agents. Press n for a new pane, type a prompt, press m to merge output back. Beautifully simple. But that's all it is β€” a local tool. No API surface, no webhook integration, no way to trigger it from a chat message or connect it to a project tracker.

Each of these projects got something right. Composio AO's plugin interfaces are genuinely well-designed. AgentsMesh's kanban binding is the right abstraction. dmux proved that the interaction model can be dead simple. But none of them close the actual loop.

The actual loop is: instruction in, merged code out, with the human doing nothing in between.

The Gap

Here's what I want to happen:

I send a message in Feishu: "Implement user authentication with JWT, add rate limiting to all API endpoints, and write integration tests."

Then I go make coffee.

When I come back, three Linear issues exist. Three PRs are open, each auto-reviewed. The ones that passed review are already merged. The one that failed has been re-submitted with the reviewer's feedback addressed. A summary card in Feishu shows me what happened.

That's it. That's the entire product.

But to make that work, you need answers to questions that none of the existing tools have solved:

How do you keep state alive for 30 minutes? A coding agent can run for half an hour on a complex task. If your orchestrator crashes at minute 29, do you lose everything? Composio AO's answer is "yes." Traditional serverless times out. A cron job can't maintain workflow state. You need durable execution β€” a system that survives crashes, replays from the last checkpoint, and resumes exactly where it left off.

How do you wait without burning compute? After an agent submits a PR, it might wait 5 minutes for a review. A polling loop is wasteful. A setTimeout is fragile. You need a system where "wait for an external event" costs zero resources.

How do you isolate agents from secrets? If you give an AI agent full-auto bash access and your environment has GitHub tokens, Linear API keys, Feishu credentials β€” the agent can read all of them. One prompt injection in a code review comment, and an attacker exfiltrates every secret in your environment. You need privilege separation: the thing that writes code should not be the thing that calls APIs.

How do you prevent an agent from approving its own PR? GitHub blocks self-approval. If your agent creates the PR and your CI bot reviews it with the same identity, GitHub rejects the review. You need separate identities β€” a bot account that writes code, a different token that reviews it.

How do you close the feedback loop? When a reviewer leaves a comment on a PR, that comment needs to reach the agent that wrote the code, not just land in a notification inbox. The agent needs to read the feedback, understand it, revise the code, push again, and wait for the next review. This is a stateful conversation across two systems (GitHub and your orchestrator) that can span hours.

Each of these is a solved problem in isolation. But nobody has wired them together.

The Thesis

Foundry is the wiring.

The core bet: Temporal as the backbone. Every other architectural choice flows from this one.

Temporal is a workflow engine built for exactly this kind of problem β€” long-running, stateful processes that interact with external systems through events. OpenAI uses it for production Codex. The key primitive is workflow.condition(): the workflow pauses, consuming zero resources, and resumes only when a Signal arrives. An agent submits a PR, the workflow pauses, a GitHub webhook fires when the review lands, the webhook translates to a Temporal Signal, the workflow resumes. No polling. No timeouts. No lost state.

On top of Temporal, three separated workers:

  • Orchestration worker: lightweight, runs workflow logic, holds no secrets
  • Agent worker: CPU-heavy, runs claude -p and codex --quiet subprocesses, holds only AI API keys
  • Integration worker: I/O-bound, calls Linear, GitHub, and Feishu APIs, holds all integration secrets

The agent worker never sees a GitHub token. The orchestration worker never touches a subprocess. Each worker is a minimal attack surface. If an agent gets prompt-injected through a malicious code review comment, the worst it can do is write bad code β€” it can't exfiltrate your Linear API key because it literally doesn't have one.

Two workflow tiers mirror the task hierarchy:

  • Parent workflow receives the instruction, calls Claude to decompose it into tasks, creates Linear issues, fans out N parallel child workflows, aggregates results, and sends a Feishu summary card
  • Child workflow handles one task: code β†’ PR β†’ review β†’ iterate (up to 5 rounds) β†’ merge. Each child runs in an isolated git worktree

The parent workflow is the foreman. The child workflows are the builders. Temporal is the construction site that keeps everyone safe.

What This Series Will Cover

This is Part 1 β€” the "why." Foundry doesn't exist yet as running code. It exists as five research documents, an architecture blueprint, and a security review that found four critical vulnerabilities in my initial design (all resolved before writing a single line of implementation).

The series will follow the build:

  • Part 2: What five research reports taught me before I wrote any code
  • Part 3: Three workers, three keyrings β€” privilege separation for AI agents
  • Part 4: The first end-to-end loop: one Feishu message to one merged PR
  • Parts 5+: Multi-task parallelism, Linear Agent Sessions, recovery from failure

I could have started coding on day one. I spent a week on research instead. In Part 8 of Agentic AI Thoughts, I wrote that orchestration matters more than generation. Foundry is the bet that orchestration of orchestration β€” the layer above the agent β€” matters even more.

The model can code. The harness gives it hands. Foundry gives it a job site, a foreman, and a safety protocol.

Let's build.


Β© Xingfan Xia 2024 - 2026 Β· CC BY-NC 4.0