ENZH

CLAUDE.md for Humans

Three companies recently made their employee handbooks public. MrBeast wrote 36 pages — raw, unpolished, typos included, reads like a founder's brain dump at 2 AM. Netflix distilled theirs to 5 pages of dense organizational philosophy. Duolingo designed a 64-page illustrated book with version numbers (1.0.0), a mascot anatomy diagram on the cover, and "O.BUTT" labeled on the back.

All three generate north of $1M in revenue per employee. All three published documents that attempt to solve the same problem.

That problem: how do you transfer your taste, judgment, and operating principles to another entity — so it can execute autonomously without constant supervision?

If you work with AI agents, you recognize this immediately. It's the exact problem a CLAUDE.md file solves. You write down how you think, what you care about, what "good" looks like, what to never do. The agent reads it and acts accordingly.

These handbooks are CLAUDE.md files for humans.


Three observations:

1. The taste transfer problem is the same whether the recipient is a new hire or an AI agent. All three handbooks encode the founder's judgment into a document. The approaches differ radically — brain dump, constitutional principles, or operating system — but the underlying challenge is identical to writing a system prompt.

2. Three archetypes: Founder-Brain, Context-Not-Control, Experiment Machine. MrBeast amplifies one person's creative vision. Netflix distributes autonomous judgment across thousands. Duolingo builds a machine that generates and evaluates experiments. Each has different implications for how organizations absorb AI.

3. $1M/employee is the output of technology leverage, not a strategy. All three refuse to scale headcount linearly with revenue. They found different leverage mechanisms — YouTube's algorithm, Netflix's recommendation engine, Duolingo's product-led growth. These companies are already proto-agentic organizations. They got there before AI agents existed.


1. The Taste Transfer Problem

Every CLAUDE.md file I've written answers the same questions: What are we optimizing for? What does "good" look like? What are the non-negotiable constraints? What should you do when you encounter ambiguity?

These handbooks ask the same questions. They just answer them for humans.

1.1 MrBeast: The Brain Dump

Jimmy Donaldson's approach is the most direct. He opens with an apology — "Sorry in advance for all the run on sentences and grammar issues, I'm a youtuber not an author haha" — then proceeds to dump his entire operating philosophy into a document.

The handbook IS Jimmy's brain. Not a processed, corporate-communications version of it. The raw thing. He tells you what metrics matter (CTR, AVD, AVP). He tells you his decision-making framework ("Math, Science, Vision, Approvals, Budget — everything you need can be solved by one of these 5 things"). He tells you the exact emotional register he wants ("I want money spent to be shown on camera. If you're spending over $10,000 on something and it won't be shown on camera, seriously think about it").

This is system prompt engineering, applied to humans.

The parallel to CLAUDE.md is almost eerie. Consider his section on how to bring him a question: "Instead of saying 'in a coming up video we are giving away a car, what do you think of this lexus it's only $10,000' ... do this instead: 'We have a coming up video. One of the bits at the 6 to 9 minute mark we will be giving away a car. We are still on budget and the budget for this car is $10,000. I searched all of NC for cool ass cars around that price point and here are 5 I found that I got preapproved by creative. I also got 5 other backup options.'"

That's not a communication guideline. That's a prompt template. He's teaching employees to provide sufficient context, define constraints, present options, and reduce his cognitive load — the same things you'd teach an AI agent through its system prompt.

1.2 Netflix: The Constitution

Netflix takes the opposite approach. Where MrBeast writes specifics, Netflix writes principles. The entire document rests on four pillars: The Dream Team, People over Process, Uncomfortably Exciting, Great and Always Better.

The operational philosophy is one sentence: "Act in Netflix's best interests." The vacation policy is two words: "Take vacation." The expense policy is five words: "Act in Netflix's best interests."

This is a CLAUDE.md that says: I trust your judgment. Here are the values. Figure out the rest.

The bet is that you hired the right people. Netflix's "keeper test" — "knowing everything I know today, would I hire X again?" — is quality assurance for the human agents in the system. If the answer is no, you replace them. Not because they failed. Because someone better exists.

In AI terms, Netflix's approach is: give the agent a short, high-quality system prompt with strong values, then evaluate outputs aggressively. If the agent consistently underperforms, swap models.

1.3 Duolingo: The Operating System

Duolingo's handbook reads like neither a brain dump nor a constitution. It reads like documentation for a piece of software.

Five principles with clear names (Take the Long View, Raise the Bar, Ship It, Show Don't Tell, Make It Fun). An operational framework called "The Green Machine" with six sequential steps. A glossary defining internal terms (what's a "Nuo"? what's "PR"? what does "Go, Go, Go!" mean?). Version numbers. A table of contents.

This is a CLAUDE.md that doesn't depend on any single person's taste. The system itself generates taste through experimentation. Run hundreds of experiments per week. Let data decide. Double down on what works, kill what doesn't. The "clock speed" concept — minimizing the gap between a decision and its implementation — is the same principle that makes AI agent loops effective: faster iteration beats better planning.

Duolingo's approach to quality illustrates the difference. They don't ship MVPs (Minimum Viable Products). They ship V1s (Version 1s). The distinction matters: "MVPs often have a lower standard of quality and can be used as an excuse to ship subpar work. V1s, on the other hand, are polished. They may not have all the bells and whistles, but they meet our bar."

That's not a style choice. It's a judgment-encoding choice. By naming the concept "V1" instead of "MVP," they've embedded a quality standard into the vocabulary itself. Every time someone says "let's ship the V1," the word carries the standard with it.

2. Three Archetypes for the AI Era

Each handbook represents a distinct organizational archetype — and each has different implications for how AI agents integrate.

2.1 MrBeast: One Visionary, Many Executors

Everything flows from one person's taste. Jimmy personally evaluates every video's "wow factor." He sets the creative direction for four channels, three businesses, and a charity. The entire organization exists to amplify one person's vision.

The MrBeast model maps most naturally to human + AI agent teams because it's already structured that way. One human holds the taste. Everyone (and eventually, everything) else executes within that frame. Replace the production team with AI agents and the architecture barely changes — the founder still decides what "good" looks like, agents handle execution.

The ceiling of this model is the founder's bandwidth. AI agents raise that ceiling dramatically. Jimmy currently can't personally review every detail of every video across every channel. With AI agents handling execution, he could — or at least get much closer.

But there's a critical detail in the handbook that complicates this: "I am not always right." Jimmy explicitly tells employees to push back, bring options, challenge his assumptions. James Warren — who "understands every single part of this company at a deep level" — can solve problems Jimmy hasn't even identified. That human pushback is load-bearing. Remove it, and the founder's blind spots go unchecked.

Current AI agents can't genuinely push back. They can be instructed to generate counterarguments, but they lack the independent judgment to know when to push back. The MrBeast model with AI agents risks losing the dissent that actually makes it work.

2.2 Netflix: Distributed Autonomous Judgment

Netflix's model is the most sophisticated and the hardest to translate to AI.

"Highly aligned, loosely coupled" teams. "Informed captains" who make decisions independently. "Farming for dissent" — actively seeking out people who disagree before committing to a direction. "Context not control" — managers give teams the information and clarity needed to make good decisions, rather than making the decisions themselves.

This model depends on something AI agents don't have yet: genuine autonomous judgment about ambiguous situations. Netflix wants employees who can assess a situation, form an independent opinion, advocate for it, then disagree and commit when the decision goes the other way.

Netflix's "(almost) no rules rule" is the most telling detail. They minimize process because process "prevents the process creep that typically happens when companies grow and try to dummy proof their organizations." But this only works with "unusually responsible people" — the kind who pick up trash without being asked and don't need policies to do the right thing.

In AI terms, Netflix is asking for agents with strong general reasoning, minimal guardrails, and the ability to operate with full autonomy within a value system they've internalized. That's not where current agents are. We're still in the "detailed system prompt" era, not the "internalized values" era.

The paradox: the organizational model that produces the best human culture is the hardest to replicate with AI.

2.3 Duolingo: The Experiment Loop

Duolingo's model is the most naturally AI-amplifiable.

The Green Machine is already an algorithm: (1) Staff it with great people, (2) Define success metrics, (3) Set guardrails and think long-term, (4) Build the thing and set up feedback loops, (5) Execute with urgency, (6) Double down on what works, stop what doesn't.

AI agents slot into every step of this pipeline. Step 1 stays human (hiring). Steps 2-3 involve human judgment. But steps 4-6 — build, measure, iterate — are exactly what agent loops do best. Running hundreds of experiments per week, analyzing results, generating variants, measuring metrics. The data decides, not any individual's taste.

Duolingo already runs "hundreds of experiments across the company at any given time." Each experiment is a small bet. Most fail. The ones that work get doubled down on. This is structurally identical to how you'd run an AI agent fleet: spawn many agents, evaluate outputs, keep what works, kill what doesn't.

The "99 Bad Ideas" tradition — where leadership brainstorms "ridiculous, unlikely questions" like "What if you could talk with Duolingo characters? What would Duo do with five seconds at the Super Bowl?" — is basically temperature-cranked generation followed by human curation. Generate widely, filter ruthlessly. That's the agent workflow.

3. What $1M/Employee Actually Measures

The revenue-per-employee numbers:

Company~Employees~Revenue~Rev/Employee
MrBeast250$700M+$2.8M
Netflix14,000$38B$2.7M
Duolingo900$700M+$780K

These numbers are impressive but they're outputs, not strategies. The real pattern is what these companies refuse to do: scale headcount linearly with revenue.

Each found a leverage mechanism:

  • MrBeast: YouTube's distribution algorithm. 50 people make a video, 100 million watch it. The algorithm IS the sales team.
  • Netflix: recommendation engine + global infrastructure. Software serves 260M+ households. Content creation scales, but distribution doesn't require headcount.
  • Duolingo: product-led growth + gamification. The owl does the marketing. The app teaches 100M+ MAU. The Streak feature drives daily engagement without human intervention.

These are already proto-agentic organizations. Between the humans and the output, there's a technology layer that multiplies the effect of each person's judgment by orders of magnitude.

3.1 The Taste Bottleneck

This connects directly to the Dao/Skill framework. All three companies are "taste bottleneck" organizations. The constraint isn't execution capacity — it's the bandwidth of the people who know what "good" looks like.

MrBeast: Jimmy's taste in what makes a YouTube video go viral. Netflix: the taste of executives who decide what content to commission and what culture to maintain. Duolingo: Luis von Ahn's taste in what makes learning delightful and sticky.

AI agents don't expand taste. They expand execution. That's why these organizational structures already work for the AI era — they've identified taste as the scarce resource and built everything else as an amplification layer.

This is the same conclusion I reached in the Sequoia analysis: the intelligence layer gets automated, the judgment layer stays human. These three companies figured out the division of labor before "AI agents" was even a category.

3.2 Where the Models Converge and Diverge

All three handbooks share certain principles:

Results over hours. MrBeast: "The amount of hours you work is irrelevant. We are a results based company." Netflix: evaluate on "whole record" of performance, not effort. Duolingo: "ruthless prioritization" — cut what doesn't move metrics.

A-players only. MrBeast: "There is only room in this company for A-Players. C-Players are poisonous." Netflix: the keeper test. Duolingo: "Better a hole than an a**hole."

Extreme ownership. MrBeast: "It's your fault, track the contractor." Netflix: "informed captains" who own decisions. Duolingo: "Taking Ownership — putting a person or team on a task, providing a clear mandate and saying, 'You are responsible.'"

Creativity over money. MrBeast: "$20,000 prize vs. a year's supply of Doritos for $1,825." Netflix: "use creativity to save money" (though Netflix spends plenty). Duolingo: the 5-second Super Bowl ad that generated as much buzz as a 30-second spot at a fraction of the cost.

Radical candor. MrBeast: "I'd rather you be honest with each other than nice." Netflix: "extraordinary candor — ensuring constructive feedback is part of our everyday work." Duolingo: "Hard on the work, easy on the people."

Where they diverge is on the question of control. MrBeast is top-down — one person's taste drives everything. Netflix is decentralized — context not control, farming for dissent. Duolingo is systematic — the Green Machine decides, not any individual.

In the AI era, this divergence maps to different agent architectures:

ArchetypeHuman RoleAgent RoleControl Pattern
MrBeastVision + taste + final callExecution within specsCentralized human, delegated agent
NetflixAutonomous judgment within valuesSupporting analysisDistributed human, advisory agent
DuolingoSet metrics + guardrailsRun experiments + measureSystematic: human designs the machine, machine runs

Where I Stand

The CLAUDE.md parallel is more than a metaphor.

Writing a CLAUDE.md file forces you to articulate things you've never had to articulate — your decision-making heuristics, your quality bar, your non-negotiable constraints, the specific way you want information presented to you. Most of that lives in your head as tacit knowledge. Putting it into words is hard.

These handbooks represent the same exercise, performed for an entire organization. And the quality of the handbook — how precisely it encodes the founder's judgment — directly predicts how autonomous the organization can be.

MrBeast's handbook works because it's so specific. "Know what minute mark of the video you're working on." "Say the negatives, not just the positives." "A year's supply of Doritos is 1,825 packs." That level of specificity is what makes a good system prompt — not abstract principles, but concrete operational guidance.

Netflix's handbook works because it selects for people who don't need specifics. The keeper test, the emphasis on self-motivation, the minimal rules — this is a system prompt that says "you should be smart enough to figure out the rest." It works, but only with a very particular kind of agent.

Duolingo's handbook works because it encodes judgment into a process rather than a document. The Green Machine doesn't need every engineer to share Luis's taste. It needs them to run experiments, measure results, and iterate. The process generates good outcomes regardless of who operates it.

The uncomfortable implication: the organizational model that maps most easily to AI agents (MrBeast — one visionary, many executors) is also the most fragile. It depends entirely on one person's judgment and loses the human pushback that keeps that judgment honest.

The model that's hardest to replicate with AI (Netflix — distributed autonomous judgment) is also the most resilient. It doesn't depend on any single person's taste. But it requires agents that can genuinely think independently — and we're not there yet.

The model that's most immediately amplifiable by AI (Duolingo — the experiment machine) is the one where taste emerges from the system rather than from any individual. Replace some of the experiment designers with agents, and the machine keeps running. Maybe faster.


The tool era tested what you could execute. The AI era tests what you can encode — your judgment, your taste, your sense of what "good" looks like — into a system that runs without you.

Three companies, three handbooks, three system prompts. Same problem as writing a CLAUDE.md. Harder to solve when the agent is human.


© Xingfan Xia 2024 - 2026 · CC BY-NC 4.0