The AI-Native Team

In the previous piece, I read the employee handbooks of MrBeast, Netflix, and Duolingo — three companies that each generate over $1M per employee — and found three distinct archetypes for how organizations encode and transfer taste.

MrBeast: one person's judgment drives everything. Netflix: hire autonomous people, give minimal rules. Duolingo: build a machine that generates taste through experiments.

Each has strengths. Each has blind spots. MrBeast's model translates easiest to AI teams but loses the human pushback that keeps the founder honest. Netflix's model produces the best culture but requires agents that can genuinely think independently. Duolingo's model is the most naturally AI-amplifiable but assumes you can measure everything that matters.

So here's the question: if you were designing a team from scratch for the agent era — not adapting an existing org, but building from zero — what would it look like?

Three arguments:

1. The AI-native team is 5-7 humans selected entirely for Will. Not Skill. Will — taste, judgment, direction, curiosity. AI handles all Skill. This isn't downsizing. It's redesigning what a "team" means.

2. Three design principles. Taste diversity across domains (not one founder's taste, not consensus). Engineered dissent (Netflix's "farming for dissent," mechanized for AI). Context fabric (Block's "world model," replacing hierarchy's routing function).

3. The hardest unsolved problem isn't building the team — it's evaluating it. We have GAAP for accounting, ICD-10 for medical coding, test suites for software. We have no equivalent for taste. Until we do, AI-native teams will be constrained by their inability to measure their own judgment quality.

1. The Premise: Execution Is Approaching Zero Cost

I laid the groundwork for this in two earlier pieces.

In Dao Rises, Skill Fades, I argued that Will (direction, taste, judgment) is appreciating while Skill (execution, technical craft) is depreciating. Hire for Will, not Skill. The resume is a Skill document — years of experience, tools mastered, certifications earned. All depreciating assets. The question nobody's figured out yet: what does a Will document look like?

In When AI Delivers Results, I analyzed Sequoia's framework: for every $1 companies spend on software, they spend $6 on services. AI is coming for the $6 market — not selling tools, but delivering outcomes. Block's organizational experiment eliminates permanent middle management and replaces hierarchy's information-routing function with an AI world model.

The synthesis: when execution cost approaches zero, the only thing that differentiates teams is the quality of their judgment.

My own team at Compute Labs is already operating at this frontier. Three people doing the work of fifteen. Not because we're 5x better engineers — because each person has strong Will (clear taste, independent judgment, relentless curiosity) and AI handles the Skill layer. Three people with strong Will and Claude Code outproduce a team of fifteen where most members were hired for technical skills that AI can now replicate.

This isn't a temporary efficiency gain. It's a structural shift. And it has specific implications for how you design a team.

2. What the Team Looks Like

Concretely: 5-7 humans. No hierarchy. No permanent roles. Every person is what Block calls a "player-coach" — they both think and ship. Each person owns a taste domain — a specific area where their judgment is the tiebreaker.

Surrounding these 5-7 humans: dozens to hundreds of AI agents handling all execution. Code, analysis, research, experiment design, content production, data processing. The agents don't need to be managed in the traditional sense — they need to be directed. Given context, objectives, and quality criteria. The humans don't write code. They write intent.

A day in this team looks roughly like this:

Morning: Each person reviews overnight agent output in their taste domain. An agent ran 40 experiments on onboarding flow variants — the product person reviews results and picks the three worth pursuing. An agent drafted four approaches to a partnership proposal — the business person reads them, picks the best frame, edits the judgment calls, sends it. An agent flagged three anomalies in user data — the analytics person investigates the one that doesn't look like noise.

Midday: The team syncs. Not a status update — nobody needs to "report" what they did because the shared context fabric already shows everything. The sync is for judgment alignment. "I'm seeing this pattern in user behavior. Does it change your thesis about pricing?" "The experiment results suggest our assumption about retention was wrong. Should we pivot the feature direction?" These are taste conversations. No Skill conversations needed.

Afternoon: Each person sets the next cycle of agent work. Not task-by-task micromanagement — high-level objectives with constraints. "Explore three alternative architectures for the notification system. Optimize for simplicity and maintainability, not feature count. Show me the trade-offs, don't pick for me." The agents work overnight. The cycle repeats.

The critical insight: no one in this team was hired for what they can do. They were hired for what they can see. The ability to look at 40 experiment results and know which three matter. The ability to read a draft proposal and feel where the framing is wrong. The ability to spot the signal in the noise. That's taste. And it's the only thing the team is made of.

3. Three Design Principles

3.1 Taste Diversity

The first handbook piece identified three approaches to taste in organizations. MrBeast concentrates it in one person. Netflix distributes it across autonomous individuals. Duolingo generates it through experiments.

The AI-native team needs a fourth approach: a small council of diverse taste.

Not one founder's taste — that creates a single point of failure. Not distributed autonomous taste — that requires agents capable of genuine independent judgment, which we don't have yet. Not pure experiment-driven taste — that only works for things you can measure, and the most important decisions often can't be measured until it's too late.

Instead: 5-7 people who each own a taste domain, with enough diversity that they see different things when looking at the same data. Zhang Yueguang's principle from Dao Rises, Skill Fades: "A team needs shared Vision and aligned values, but diverse Taste, working styles, and thinking patterns."

This matters more as AI gets better, not less. Give five engineers the same AI tools and the same spec — they produce nearly identical code. Execution is converging. The differentiator is which five different perspectives decided what to build. One person catches a UX problem others miss. Another has cross-domain knowledge that reframes the architecture. A third has taste in communication that makes the marketing land differently.

AI makes execution homogeneous. Taste diversity is the only remaining edge.

3.2 Engineered Dissent

Netflix's most powerful organizational concept is "farming for dissent" — before making a significant decision, the responsible person actively seeks out people who disagree. Not waiting for objections. Hunting them down.

In a human organization, this happens naturally. People have egos, opinions, competing priorities. Disagreement is the default. The hard part is channeling it productively.

In an AI-augmented organization, the opposite problem emerges. AI agents are agreeable by design. Ask an agent "is this a good strategy?" and it will find something positive to say. Show it your plan and it will praise the approach. This isn't a bug — it's a training outcome. Models are optimized for helpfulness, and helpfulness often gets conflated with agreement.

This creates a dangerous dynamic: the team makes a decision, asks agents to evaluate it, and gets back consensus. Except it's not consensus — it's sycophancy.

The fix: engineer dissent into the system.

Three mechanisms:

Red-team agents. Before any major decision ships, a dedicated agent (or set of agents) with deliberately adversarial system prompts evaluates it. "Find every way this could fail." "What are we not seeing?" "Who gets hurt by this decision?" The red-team agent's job is explicitly not to be helpful — it's to be hostile. The team reviews its objections and decides which ones are load-bearing.

Diverse priors. Run the same analysis with multiple agents configured with different assumptions. One agent assumes the market is growing. Another assumes it's shrinking. One assumes the user is sophisticated. Another assumes they're a first-timer. When the outputs diverge, that divergence IS the information. The disagreement between agents surfaces the judgment calls the team needs to make.

Historical calibration. Track the team's past decisions and their outcomes. "Six months ago we decided X. The data now shows Y. Were we right?" This creates a feedback loop on taste quality — not just "did the output look good?" but "did the judgment actually work?" Over time, this builds an institutional understanding of where the team's taste is strong and where it's miscalibrated.

Netflix gets dissent for free because humans naturally disagree. The AI-native team has to build dissent on purpose.

3.3 Context Fabric

In When AI Delivers Results, I analyzed Block's four-layer architecture: capability atoms, world model, intelligence layer, and interface. The key insight: hierarchy's essential function isn't power — it's information routing. A manager's job is translating strategy downward and aggregating status upward. It's being a router.

Block replaces that router with a "world model" — a real-time, comprehensive representation of company state. When a signal appears (a merchant's tax deadline approaches while their loan just got approved), the system detects and acts on it. No PM needed to think of that feature. The system discovered the need.

For the AI-native team, this becomes the context fabric — a shared layer that makes all relevant information available to every team member and every agent, in real time. No status meetings. No "I didn't know about that." No information silos.

Concretely: every agent interaction, every experiment result, every customer signal, every decision and its rationale — all of it flows into a shared context layer. When the product person is evaluating experiment results, they automatically see the business context (a partnership deal that might change the product direction). When the business person is drafting a proposal, they see the latest product metrics that strengthen the case.

This isn't a dashboard. Dashboards are static summaries you pull. The context fabric is a live system that pushes relevant context to whoever needs it, when they need it. The routing is done by AI, not by managers.

The prerequisite — and Block's paper makes this clear — is high-quality, structured data. Block has payment data: every transaction is a structured event. Most companies don't have that. For the AI-native team, the context fabric starts with a discipline: every decision, every experiment, every judgment call gets recorded in structured form. Not for accountability. For context. So the next person (or agent) who encounters a similar situation has the benefit of what you learned.

This is also, incidentally, why the CLAUDE.md analogy runs deeper than metaphor. A CLAUDE.md file IS a context document — it tells the agent what it needs to know to make good decisions. The context fabric is a living CLAUDE.md that updates in real time.

4. Hiring for Will

If the AI-native team selects entirely for Will, what does the hiring process look like?

The traditional interview is a Skill test. Whiteboard coding. System design. "Tell me about a time you..." — all testing what you've done, not how you think.

A Will-based hiring process would look fundamentally different:

Taste test. Show the candidate ten products, ten designs, ten strategies. Ask them to rank, critique, and explain. Not "which is best?" — there's no right answer. The question is: "do they see things I don't? Is their taste complementary to the existing team's?"

Judgment simulation. Present an ambiguous scenario with incomplete data — a real one, from the company's history. "Here's what we knew at the time. What would you have decided? Why?" Then reveal what actually happened. The quality of their reasoning under uncertainty matters more than whether they got the "right" answer.

Curiosity probe. What have they explored recently that wasn't required by their job? What rabbit holes have they gone down? What do they know about that has nothing to do with their resume? Cross-domain curiosity is a leading indicator of Will — people who want to understand everything tend to develop better judgment than people who optimize for depth in one domain.

Dissent exercise. Present the team's current strategy and ask them to argue against it. Not as a gotcha — as a genuine test of whether they can form independent opinions and articulate them clearly. If they can't push back on the interviewer's own strategy, they won't push back on AI sycophancy either.

None of this tests Skill. All of it tests Will. The hardest part for most companies: accepting that someone who "fails" a traditional coding interview might be exactly the person you need.

5. What Breaks

Every organizational model has failure modes. The AI-native team is no exception.

5.1 The Evaluation Problem

In When AI Delivers Results, I mapped out how evaluation infrastructure maturity determines the speed of automation. Accounting has GAAP. Software has test suites. Law has... nothing comparable.

Taste has even less. How do you evaluate whether someone has good judgment? You can measure outcomes — but outcomes depend on luck, timing, and a thousand variables beyond the judgment call itself. A good decision can produce a bad outcome. A bad decision can produce a good outcome. The feedback loop is noisy and delayed.

For the AI-native team, this creates a specific problem: you can't optimize what you can't measure. If taste is the only thing the team is made of, and taste can't be reliably measured, how do you know if the team is good?

Partial answer: portfolio evaluation over time. Not "was this decision correct?" but "over 50 decisions, does this person's judgment produce better outcomes than chance?" This requires patience — years, not quarters — and the discipline to track decisions and outcomes systematically.

This is the evaluation infrastructure opportunity I identified for the broader economy, applied internally.

5.2 The Culture Problem

In a traditional team, culture emerges from human interaction. Lunch conversations. Hallway debates. The friction of working together. When 90% of the "team" is AI agents, where does culture come from?

The honest answer: it comes from the 5-7 humans. The small team's interpersonal dynamics, shared values, and communication norms become the culture. The agents inherit it through their system prompts — which, as the handbook analysis showed, are culture documents.

But this means culture is both more fragile (depends on fewer people) and more explicit (must be written down, because agents can't absorb it through osmosis). Duolingo's approach — codifying culture into principles and processes — becomes necessary, not optional.

5.3 The Accountability Problem

When an AI agent makes a mistake — prices an insurance policy wrong, generates misleading analysis, ships a feature that breaks things — who's responsible?

In the AI-native team, the answer is clear: the person whose taste domain covers that decision. They reviewed the agent's output (or should have). They set the quality criteria. They own the result.

This only works if the team has clear domain ownership and if every significant agent output goes through human judgment before it ships. Full autopilot — agents making decisions without human review — is tempting but premature. As I argued in the Sequoia analysis, most organizations will settle at L3 (conditional autonomy), not L5 (full autonomy). The AI-native team is L3 by design: agents execute, humans judge.

Where I Stand

The AI-native team isn't hypothetical. Fragments of it already exist. My own team at Compute Labs operates at the edge of this model — three people with strong Will, AI handling Skill, no hierarchy, shared context, taste-driven decisions.

The pieces that work: massive execution leverage, rapid iteration, the ability to explore more options than any traditional team could.

The pieces that don't work yet: evaluation of taste quality, engineering genuine dissent (not performative counterarguments), and building context fabric that's rich enough to replace the information-routing function of managers.

The three handbook archetypes each contribute a piece of the answer. From MrBeast: the importance of specific, concrete taste encoding — not abstract principles, but operational guidance a new team member (or agent) can actually execute against. From Netflix: the importance of dissent, the keeper test as quality assurance for the team itself, and the courage to part ways with people who are good but not great. From Duolingo: the importance of systematic experimentation, the discipline of measuring what works, and the willingness to kill what doesn't.

None of these is sufficient alone. The AI-native team takes what works from each and discards what doesn't translate.

The remaining unsolved problem is evaluation. We can build the team. We can design the principles. We can hire for Will. But we can't yet measure whether the taste we're selecting for is actually good — or just confidently wrong.

That measurement challenge is the next frontier. Whoever solves it — builds the "GAAP for judgment," the evaluation framework for taste quality — unlocks the full potential of AI-native organizations.

Until then, we build with the best judgment we have, run experiments where we can, and accept that some of the most important decisions in the agent era will be made by gut feel.

The tool era rewarded what you could build. The outcome era rewards what you can judge. The team of the future isn't defined by how much it can produce — it's defined by how well it can tell the difference between good and great.