ENZH

My AI Thesis

Over the past year, I built six AI products, worked on GPU infrastructure financing at Compute Labs, and wrote close to 200 blog posts dissecting different facets of this landscape β€” companion AI, agent economics, chip wars, the future of work, what happens to SaaS moats, what happens to jobs. Each post captured one angle. None captured the whole picture.

This post is the synthesis. Not a recap of what I've written β€” a unified framework connecting all of it into one coherent view. The thesis I actually operate on when I'm deciding what to build, where to invest time, and how to think about the next 3-5 years.

Not predictions. A working model.

When Code Becomes Disposable

The shift I keep coming back to: software is becoming disposable.

Agents already generate code, use it for a task, and discard it. A non-technical member of our team built a social media scheduling tool with Claude Code β€” zero programming background. He described what he wanted, iterated with the agent, and shipped something functional. The tool itself isn't the point. The point is that the tool cost almost nothing to create and could be rebuilt from scratch in an afternoon.

When building is that cheap, what's actually durable?

Cognition. Not the code. Not the model weights. The accumulated understanding an AI builds of a specific person over months of interaction β€” decision patterns, emotional triggers, communication preferences, the thousand small signals that make up how someone actually thinks. Three months of conversation history mapping those patterns is non-replicable, non-compressible, non-accelerable. You can't train a shortcut to it. You can't copy-paste it. You have to earn it in real time.

This is the foundation everything else rests on. The moat isn't technology. It's relationship.

And here's why that matters for everything that follows: if cognition is the durable asset, then the entire value chain reorganizes around it. The AI that understands you best wins β€” not the one with the best model, the best UI, or the best marketing. This means personalization, which was previously a luxury reserved for the wealthy (personal assistants, concierge doctors, private tutors), becomes economically viable for the first time. The CEO lifestyle becomes a subscription product.

Three Demand Curves

I see three distinct demand curves converging on the same bottleneck. Each comes from a different market. All of them want the same thing: more compute.

Corporate: Agents Replacing Headcount

My team of 3 engineers produces what used to require 15-20. The multiplier isn't hypothetical β€” I've tracked it across six products shipped in rapid succession. Everyone became a manager of AI agents. You describe the task, it executes, you review.

This is the new core competency. Not coding. Management. Knowing how to decompose problems, allocate agent resources, evaluate output quality. The skills that made someone a good engineering manager now make someone a good individual contributor.

Big tech already internalized this. Meta has cut headcount in multiple rounds since 2025. The era of coasting on headcount-as-vanity-metric is over. I saw it firsthand at Apple β€” people coming in at 10, leaving by 3, building 10 projects and launching 1. That culture cannot survive the existence of agents that work around the clock without needing a skip-level to feel valued.

The corporate demand curve is the most immediate of the three because the ROI calculation is brutally simple. An enterprise seat costs $100-200/month. A knowledge worker costs $8,000-15,000/month fully loaded. If the AI handles even 30% of their output, it's paid for itself ten times over. CFOs don't need convincing. They need procurement approval.

Creator: The Printing Press Moment

We're living through something analogous to the printing press. Before Gutenberg, creating a book required a monastery. After, it required a printing shop. Neither required literacy to go away β€” they just made the bottleneck shift from production to having something worth saying.

AI agents are doing the same thing for software. Every person with a repetitive workflow is a potential builder. Every niche problem that was never worth a startup's time can now be solved by the person who has it.

This is why SaaS moats are collapsing. The traditional SaaS playbook β€” lock in data, create workflow dependency, build integration moats, charge 75%+ gross margins β€” faces a structural threat. When a user can describe their problem to an agent and get a custom solution in hours, "good enough" off-the-shelf software at premium prices becomes an arbitrage opportunity waiting to be exploited.

Mass Market: Personal AI for Everyone

This is the demand curve most people misunderstand, because they think it means better chatbots. It doesn't.

Pure chat hits a structural ceiling. Every AI companion today converges on the same problem: conversations get boring. Not because the AI isn't smart enough, but because chat is one-dimensional. Real relationships need three dimensions β€” dialogue, time (shared history that creates inside jokes and callbacks), and space (ambient presence, observation, knowing what's happening without being told).

Wearables are the missing piece. Observation-based knowledge β€” "I noticed you coded until 2am three nights this week" β€” carries emotional warmth that self-reported knowledge never will. The difference between a friend who asks "how are you?" and one who says "you look exhausted" without you saying a word.

What was previously available only to CEOs β€” a chief of staff who knows your context, a therapist who tracks your patterns, an executive assistant who anticipates your needs β€” becomes a subscription product. Everyone gets that support for $100-200/month.

And personality should emerge, not be designed. The companion AI companies that try to hand-craft a character will lose to the ones that design guardrails and let personality grow from the relationship itself. Real relationships aren't scripted. The AI that feels alive is the one that develops its voice through months of interaction with a specific person β€” not the one that was written by a character designer in a product sprint.

The Convergence

Corporations are cutting headcount. Creators are building tools. Everyone else is waiting for an AI that actually understands them.

All three roads lead to the same bottleneck: compute.

And unlike previous technology waves, these markets are additive, not substitutional. The corporate buyer replacing headcount with agents is also the creator building personal tools on weekends, and also the consumer wanting a personal AI at home. Same person, three demand vectors. Compute demand isn't growing linearly β€” it's compounding.

Compute Economics

The same level of AI performance costs roughly 97% less than it did two years ago.

Concrete numbers: GPT-4 level output cost $30 per million tokens when it launched in 2023. Today, comparable quality runs under $1. Claude Opus output pricing dropped from $75 to $25 per million tokens β€” a combination of TPU migration, inference optimization, and competitive pressure.

I expect this deflation to continue for at least 5 more years. Chip iteration cycles, model distillation techniques, and cloud price wars all have substantial runway left. Each order-of-magnitude cost drop unlocks a new wave of adopters, the same way smartphones unlocked mobile internet when hardware got cheap enough. GPS existed before the iPhone. So did maps and email. Cheap hardware in every pocket made them universal.

The critical structural shift: inference now consumes more compute than training. Deloitte confirmed this crossover in late 2025, and it's accelerating. Training GPT-4 cost an estimated $100 million or more β€” a one-time expense. But every user burns inference every single day. Agents make this worse. A single agentic coding session spawns hundreds of inference calls β€” reading context, deciding actions, executing, spawning sub-agents, verifying results. Agents are compound inference. Training is periodic; inference is forever.

Here's where Jevons paradox kicks in: costs are plummeting, but total compute spend is exploding. When coal-fired steam engines got more efficient, total coal consumption went up, not down, because efficiency made new uses economical. Cheaper tokens mean more tokens consumed, not less money spent. Every cost reduction expands the addressable market faster than it shrinks the per-unit price.

The current subsidy window makes this even more dramatic. Claude Code's subscription plan delivers roughly 40-50x the value in raw API costs. That's Uber-era pricing β€” venture capital subsidizing below-cost access to build habit and market share. It won't last forever. Exploit it while it's available.

The Silicon Wars

I've written a dedicated deep dive on this, so I'll keep this to the strategic picture.

Nvidia's moat is CUDA, not silicon. You can build a better chip. You can't build a second CUDA ecosystem. Every ML engineer learned on CUDA. Every framework defaults to it. Every optimization trick, every debugging tool, every Stack Overflow answer assumes it. Switching costs aren't just technical β€” they're cultural and educational. Jensen understands this and plays accordingly, investing in or acquiring inference chip startups before they reach escape velocity.

TPU is a focused bet. Purpose-built for matrix multiplication, the mathematical heart of transformer-based models. At what it's designed for, it's excellent β€” Anthropic's cost reduction after moving inference to TPU proves the economics. But ASICs are inflexible by design. If computing requirements shift away from matrix operations (unlikely near-term, but paradigm shifts are the norm in this field), TPUs become expensive paperweights. And you can't buy 100 TPUs and rack them β€” the architecture requires tens of thousands in custom interconnects. Google-scale or nothing.

GPU's advantage is flexibility β€” you can build anything from a 10-rack setup to a 10,000-card cluster. But networking overhead is brutal: expect roughly $1-1.5 in infrastructure for every $1 spent on GPUs. Power delivery, cooling, high-bandwidth interconnects. The GPU itself is almost the cheap part.

ASIC challengers face a two-front war: ecosystem lock-in (CUDA) and capital requirements (billions to reach competitive scale). The chip wars won't be decided in the lab. They'll be decided in developer communities and hyperscaler procurement offices.

Big Tech Positioning

Google is the comeback story. Sergey Brin returned, discovered employees couldn't even use Gemini for coding due to internal legal policy, escalated to Sundar, and injected urgency into a company that had been coasting on search revenue. Gemini 3 Flash is genuinely competitive. Google has distribution (billions of users across Search, Gmail, Android, Chrome), custom silicon (TPU), and the original transformer research team. That's a terrifying combination when focused. The question was always whether Google could focus. They're starting to answer it.

OpenAI has a side quest problem. Hardware with Jony Ive, a browser, ads, image generation, video, a social network. When Sam came back to the main storyline, the final boss had leveled up. The core battle is enterprise productivity β€” replacing expensive white-collar workflows with AI agents. OpenAI reportedly refocused at a recent offsite. Whether that sticks remains to be seen. Their strength was always raw frontier capability. Their weakness is strategic discipline.

Anthropic understood enterprise unit economics before anyone else. A tool that saves a $200K/year employee half their time is worth $50K/year easily β€” that's not consumer SaaS pricing, that's productivity pricing. They've been almost exclusively focused on enterprise and agentic work, and the results show: revenue grew from roughly $1B to $19B ARR in about a year, with approximately 80% coming from enterprise contracts. Focus as strategy.

Meta has distribution but declining execution. Llama 4 launched months ago with relative silence since. Good AI engineers left for VC-funded startups β€” they're not coming back for four-year RSU vests when they can build their own thing. But billions of users across Facebook, Instagram, and WhatsApp buy time. Distribution always does. Never count Meta out entirely.

The Centaur Window

Here's the paradox most people miss: short-term disruption is overestimated while long-term disruption is catastrophically underestimated.

The hype narrative says AI is replacing everyone tomorrow. Reality: roughly 78% of people have only "tried" AI tools, and an estimated 95% have seen zero measurable ROI. Most organizations are still in the experimentation phase. The last mile isn't technical β€” it's cognitive. The vast majority of people simply don't know what's possible yet.

But the long-term trajectory is relentless. Entry-level hiring is already shrinking. Cost curves don't reverse. Every quarter, the gap between "what AI can do" and "what most people use it for" widens.

And narratives cause real damage even before capabilities arrive. Companies lay off based on potential, not current performance. The story of AI replacing workers is, right now, causing more economic disruption than the actual technology. A CEO reads that AI can do 80% of a junior analyst's job, cuts the team by 40%, and then discovers the AI can actually do about 30% reliably. The remaining employees are overloaded, the AI tools underdeliver, and everyone suffers. But the headcount doesn't come back. Narratives are a one-way ratchet.

The centaur phase β€” where human-plus-AI dramatically outperforms either alone β€” is open right now. But it's closing. I give it a 2-3 year window, roughly 2026-2027, before the AI side of the equation becomes strong enough that the human contribution in many domains shrinks to taste and judgment.

The super individual is the endgame of this window. One person plus an agent team equals one company. The solo founder who can compete with a 50-person startup because agents handle engineering, design, marketing, and operations while the human provides direction and quality judgment.

Taste becomes the last moat. Not taste in the aesthetic sense β€” taste as knowing which problem is worth solving, which output is good enough to ship, which direction to push when the agents give you five options. It's editorial judgment applied to everything.

The infrastructure opportunity here is enormous. Agents that live portions of your life for you β€” socializing, filtering information, negotiating, scheduling β€” need identity systems, payment rails, reputation frameworks. The platform that provides agent infrastructure (how agents authenticate, transact, and build trust) is the next AWS-scale opportunity. We're not there yet, but the demand is already forming.

Everyone is debating whether AI will replace humans. The real question is whether you'll learn to direct it before the window closes.

Where This Leaves Us

Distribution is still the bottleneck.

I built six products in two months. Building has never been faster. But getting people to use them? That's still the hard part. You still need to understand what people want. You still need to earn trust. You still need to find channels, craft messaging, and convince someone to change their behavior. AI hasn't automated any of that.

When building is cheap, the scarce resource shifts to everything that comes after: positioning, trust, distribution, and sales. The trillion-dollar reallocation isn't just compute moving from training to inference. It's value moving from building to distributing.

This is the part that most technical founders β€” myself included β€” find uncomfortable. The skills that got us here (building fast, understanding systems, shipping clean code) are being commoditized by the very tools we built. What remains scarce is the ability to understand what people need before they can articulate it, to earn trust at scale, and to create distribution channels that compound. The technical moat is dissolving. The human moat is deepening.

The window right now is remarkable. Compute is subsidized. The centaur phase is open. Cost curves are your friend. The tools available to a small team in 2026 would have been unimaginable to a large team in 2023.

What I haven't figured out: how the mass-market companion curve actually plays out β€” specifically, whether the right form factor is a phone app, a wearable, or something that doesn't exist yet. Whether the ASIC challengers have any realistic path against Nvidia's ecosystem lock-in, or if CUDA's gravity is truly inescapable. How quickly the centaur window actually closes, and what the economy looks like on the other side.

These are the open questions. If you have better answers, or if you think I'm wrong about any of this, I'd like to hear it.


Β© Xingfan Xia 2024 - 2026 Β· CC BY-NC 4.0