ENZH

Four Reviewers, Three Fixers, One Release Pipeline

The Other Half

Part 4 was about building — ten tasks, five AIs, three waves of DAG-scheduled implementation. Mio's multimodal input, enhanced onboarding, and selfie generation all landed in a single session.

But building is only half the job. 81 commits from five parallel agents, spanning three architectural layers. The code worked. Every typecheck passed. But "it works" and "it holds up" are different questions.

This post covers the other half: code review, bug fixing, and release management — all using the same Agent Team pattern, applied to a completely different kind of work.

Starting State

The session started after a crash recovery — context had been compacted, so I had to re-orient. First thing: check what we're working with.

38 commits ahead of origin
TypeScript: 0 errors

Good baseline. But while scanning the repo state, I caught something: a {custom_story} placeholder was leaking into LLM prompts. Five personality config files contained dead template placeholders that were supposed to be replaced during onboarding, but since the replacement logic didn't touch those specific placeholders, they passed through as literal text — {custom_story} appearing verbatim in the system prompt sent to the model.

This is the kind of bug that "works" — the model ignores the gibberish and generates fine responses. But it wastes tokens on every single message, and it looks terrible if anyone ever inspects the prompts.

Fixed it by removing the dead placeholders from all five personality config files. Tiny fix, but this is exactly why you review before shipping.

Setting Up the Review

I told Claude Code: "ok do a full review with agent team of the implementations."

It spawned four specialized review agents, each assigned a domain:

  1. Media reviewer — transcribe, vision, process, selfie, reference-images, file-download
  2. Onboarding reviewer — state machine, commands, preset configs, schema
  3. Pipeline reviewer — server index.ts, router, system-prompt, agent loop
  4. Security auditor — API keys, input validation, path traversal, SSRF, prompt injection, cost abuse

Why specialize instead of running four general-purpose reviewers? Same reason you'd have a security specialist on a human team — domain expertise surfaces different kinds of issues. A general reviewer reads code for correctness. A security auditor reads code for attack vectors. They look at the same file and see different things.

The Results

All four reviews came back in about five minutes. Here's the summary:

ReviewerVerdictCRITICALHIGHMEDIUM
MediaWARNING0410
PipelineBLOCK246
OnboardingWARNING047+
SecurityBLOCK245

Two BLOCKs. Two WARNINGs. Not shippable.

The CRITICAL Issues

Four findings that needed immediate fixes:

  1. Missing cost attributionprocessMedia() was called before userId was available, so media processing costs (transcription, vision API calls) couldn't be attributed to the right user
  2. Typing indicator leak — when a message batch was empty, clearInterval was never called on the typing indicator, so the bot would show "typing..." forever
  3. Path traversal in HTTP API — the /api/agents endpoint took a presetId parameter that was passed directly to a file path without validation. Classic directory traversal.
  4. Bot token in download URL — Telegram file download URLs contain the bot token. If an error occurred during download, the URL (including the token) could leak into error logs.

The HIGH Issues

After deduplication across all four reviewers (some issues were flagged by multiple agents), roughly ten HIGH-severity findings:

  • Unbounded reference image cache — flagged by three separate reviewers
  • No timeout on selfie/media operations
  • No rate limiting on generation endpoints
  • Template injection via $ in .replace() strings
  • Unbounded file downloads when item.size was undefined
  • No MIME type validation
  • Timeout bypasses about_user validation with empty string
  • Race condition in /reonboard during active onboarding
  • NaN token counts in bubble transform
  • Silent cost tracking failures (.catch(() => {}) swallowing errors)

Some of these would've been caught by a single careful reviewer. But the path traversal and bot token leak? Those came from the security auditor specifically probing for attack vectors. The cost attribution issue? That came from the pipeline reviewer tracing data flow across module boundaries. Specialization earned its keep.

The Fix Pattern

With the review results in hand, I spawned three fix agents with strict file ownership:

Security Fixer

Files: agents.ts, file-download.ts, onboarding.ts, process.ts

  • Path traversal: added allowlist validation for preset IDs
  • Bot token leak: replaced raw error messages with generic ones
  • Template injection: switched from .replace() to split/join pattern
  • MIME validation: added allowlists for audio, image, and video types

Pipeline Fixer

Files: index.ts, loop.ts, reference-images.ts

  • Typing indicator: added clearInterval on empty batch path
  • Cost attribution: passed Telegram user ID through to processMedia()
  • NaN tokens: replaced with 0 in synthetic events
  • Selfie timeout: added 30-second AbortController
  • Cache bounds: max 10 entries, oldest-first eviction
  • Cost logging: replaced .catch(() => {}) with actual error logging

Onboarding Fixer

Files: onboarding.ts, process.ts, commands.ts

  • Timeout default: set to "用户未填写自我介绍" instead of empty string
  • Reonboard race: added clearOnboardingState() before starting new session
  • Post-download size check: validate after download when Telegram omits file size
  • Promise.allSettled: single media failure no longer drops all results

Notice the pattern: no two agents edit the same file. This is critical. In Part 4, file ownership prevented merge conflicts during the build phase. Same principle applies here — three agents fixing bugs simultaneously only works if they don't step on each other.

A fourth agent — the doc updater — ran in parallel, updating CLAUDE.md, API.md, SCHEMA.md, TECHNICAL.md, DATA-FLOWS.md, CONVENTIONS.md, and TODO.md. Documentation drift is real, and it's cheaper to fix it now than to discover stale docs later.

Changelog and Release

With all fixes committed, I said: "let's also add a changelog thing."

Claude Code spawned two changelog agents in parallel:

  • v0.0.1 agent: analyzed the first 39 commits (everything before Feb 26 10am PST) — the foundation build
  • v0.0.2 agent: analyzed the next 81 commits — the three-feature sprint from Part 4 plus all the review fixes

Both wrote their sections, which were merged into a single CHANGELOG.md. Then:

git tag v0.0.1 <commit-hash>
git tag v0.0.2
gh release create v0.0.1 --notes-file ...
gh release create v0.0.2 --notes-file ...

Two versions, two tags, two GitHub releases. The boring, mechanical work that nobody wants to do manually — handled by agents while I watched.

Process Improvements

The last step was institutional memory. I added two things to the project's CLAUDE.md:

  1. A "Releases & Versioning" section documenting the changelog and tagging workflow
  2. A rule: "Always use doc-updater subagent for documentation updates"

This is the part most people skip. You fix the bug, ship the release, and move on. But if the process improvement doesn't get written down, you'll hit the same friction next time. CLAUDE.md is the project's memory — what goes in there persists across sessions, across context compactions, across crash recoveries.

As I described in the VM debugging post, crash recovery is a reality of long sessions. The more you encode into CLAUDE.md, the less you lose when context resets.

The Pattern

Here's what the full review-to-release cycle looked like:

Phase 1: Pre-review cleanup        (1 agent, manual fixes)
Phase 2: Parallel code review       (4 specialized reviewers)
Phase 3: Parallel fixes             (3 fixers + 1 doc updater)
Phase 4: Changelog + release        (2 changelog agents)
Phase 5: Process improvements       (encode learnings into CLAUDE.md)

Five phases, twelve agents total, all within a single Claude Code session.

Best Practices

A few things I've learned from running review-and-fix cycles with agent teams:

Specialize your reviewers. A security auditor and a pipeline reviewer find different bugs in the same file. General-purpose reviews miss domain-specific issues.

Strict file ownership for fixers. Never let two agents edit the same file. Split responsibilities by file, not by issue. If two issues touch the same file, assign them to the same fixer.

Fix in parallel, not sequentially. Three fixers working simultaneously is three times faster than one fixer working through a queue. The overhead of coordinating file ownership is trivial compared to the time saved.

Run doc updates alongside fixes. Don't wait until "later" — later never comes. A doc-updater agent running in parallel costs nothing and prevents documentation drift.

Automate the ceremonial work. Changelogs, tags, releases — these are mechanical tasks that don't benefit from human judgment. Let agents handle them.

Encode process improvements immediately. If you learned something during the review, write it into CLAUDE.md before ending the session. Institutional memory compounds.

Building and Reviewing

Part 4 showed how to use Agent Teams to build — decompose a plan, schedule dependencies, let five agents write code in parallel.

This post showed the other side: using the same pattern to review, fix, and release. The mechanics are the same — task decomposition, file ownership, parallel execution. But the shape of the work is different. Building is creative and forward-looking. Reviewing is adversarial and retrospective. Both benefit from specialization and parallelism.

The full workflow for Mio v0.0.2 was: plan, build (5 agents), review (4 agents), fix (3 agents), release (2 agents), improve (encode learnings). One session, fourteen agents, one human making decisions at the top.

That's the complete cycle. Not just building with AI — shipping with AI.


© Xingfan Xia 2024 - 2026 · CC BY-NC 4.0