Compute Is the Root of Everything
I'm the co-founder and CTO of Compute Labs — we finance GPU infrastructure, helping people buy chips, build clusters, and turn compute into financial assets. Outside of that, I built six AI products in two months — companion AI, fortune-telling, jewelry recommendations — purely out of curiosity.
One is my startup. The other is a hobby. For a long time I thought they had nothing to do with each other.
They're the same problem.
Sometimes the constraint is that the model isn't smart enough — it can't plan well enough, or reason through a multi-step problem, or use tools reliably. That's a training compute problem. Sometimes the constraint is that inference is too expensive — the model could do the job, but running it at scale would bankrupt the product. That's an inference compute problem. Both roads lead to the same place: compute. Every opportunity I see in infrastructure traces forward to what products become possible. The two layers aren't just connected — they're the same variable observed from different altitudes. This post is about that variable.
Intelligence Is a Compute Problem
Intelligence used to be mysterious. We couldn't explain it, couldn't define it, couldn't build it. Philosophers spent centuries debating whether it was a soul, an emergent property of neural complexity, or something else entirely.
Now it's an engineering problem.
That's not a claim about consciousness or sentience. I'm not saying GPT-4 is "aware" or that running more inference makes something alive. I'm making a narrower, practical observation: for the purposes of building useful things — products, tools, agents, companions — intelligence reduces to computation. Tokens processed, speed of processing, cost per token. That's it.
When I built six AI products in two months, every capability decision was a compute decision. How many agent iterations can we afford per user action? How much context can we load before latency kills the experience? Can we run continuous background inference for proactive features, or does the cost make it impractical?
Sometimes the answer was "the model isn't smart enough" — and then a new generation would arrive with better tool calling, better planning, better reasoning, all powered by more training compute. Sometimes the answer was "we can't afford to run this at this price point" — and then inference costs would drop another 60%, powered by more efficient inference compute. Models have gotten meaningfully smarter AND meaningfully cheaper. Both matter. Both trace back to compute.
This flips the conventional narrative. Most people think AI progress means smarter models. It does — and that progress takes enormous training compute. But smarter models alone aren't enough. The other binding constraint is whether you can afford to deploy what exists. Intelligence isn't abundant — frontier capability is still hard-won and compute-intensive. Affordable intelligence is even scarcer.
And that scarcity has exactly one input: compute — both the training compute that makes models smarter and the inference compute that makes them affordable.
Cost Determines the Boundary of Possibility
This is the insight I keep returning to. Not which model is best. Not which architecture wins. The question that actually determines what gets built is: what can you afford to run?
Every major trend in AI maps directly to a cost threshold being crossed.
Agents exist because inference got cheap. A single agentic coding session — the kind I run hundreds of times a week — spawns dozens to hundreds of inference calls. The agent reads context, reasons about the task, generates code, spawns sub-agents, reviews its own output, iterates. Each step costs tokens. When GPT-4 launched at $30 per million input tokens and $60 output, running an agent loop was economically insane. Andrew Ng calculated that GPT-4's blended token price fell from $36/MTok to $4 in just 17 months — a 79% annual decline. a16z calls this phenomenon "LLMflation": inference costs for equivalent capability dropping roughly 10x per year. At Claude Sonnet 4.6 pricing ($3/$15), it's viable for power users. At flash-tier pricing ($0.15-0.25), it's viable for everyone. The models got dramatically better — tool calling, multi-step planning, code generation all improved by leaps. And the price dropped by an order of magnitude. Both improvements required more compute: training compute to make the models smarter, inference compute to make them cheaper to run.
AI companions are viable because personalization requires continuous cheap inference. A companion that truly knows you needs to process your context constantly — not just when you talk to it, but in the background, building understanding from observation and history. That's an always-on inference workload per user. At 2023 prices, this was $100+/month per user in raw API costs. At 2026 prices, it's a rounding error. The chat ceiling will break not because models get smarter, but because always-on multi-modal inference becomes cheap enough to add the missing dimensions — time and ambient presence — that pure chat lacks.
SaaS is dying because custom solutions cost less than subscriptions. When software becomes disposable, the economics of paying $500/month for a generic tool collapse. A non-technical person can describe their exact workflow to an agent, iterate for an hour, and have a custom tool that fits perfectly. But that hour of iteration burns hundreds of inference calls. This only works when those calls are effectively free. SaaS moats aren't being destroyed by better products. They're being destroyed by cheaper compute.
The super individual is a compute phenomenon. One person managing ten parallel agents, each handling a different part of a business — engineering, marketing, design, analysis, operations. That's ten simultaneous inference streams, running continuously. At $75/MTok output, this costs more than hiring the humans. At $3/MTok, it costs less than a single junior engineer's salary. The super individual isn't a new type of person. It's an old type of person with access to a new price point.
The printing press moment is a cost story. When everyone becomes a developer through conversational iteration with AI, what's actually happening is that the cost of translating intent into software dropped below the threshold where non-technical people can participate. Each iteration — "make it more like this," "add a feature that does X," "fix this edge case" — is an inference call. Democratized creation requires democratized compute.
I've written close to 200 blog posts over the past two months. When I trace each one to its root cause, I arrive at the same place every time: compute. Whether the constraint was model capability (needing more training compute) or deployment cost (needing more efficient inference compute), the root variable was always the same.
This isn't correlation. It's a causal flywheel: cheaper compute makes new applications possible, which creates new demand, which attracts investment in more compute capacity, which drives costs down further. Every turn of the flywheel expands the boundary of what's economically viable. The trends aren't independent phenomena. They're symptoms of the same underlying variable moving.
And there's a bottleneck most people outside the industry don't see: GPUs are still severely supply-constrained.
The numbers tell the story: Meta has stockpiled 350K H100s and plans to reach 1.3M GPUs by year-end. Microsoft holds ~500K H100-equivalent accelerators. Nvidia sold approximately 3M H100-class chips between 2022 and 2024 (Epoch AI), but Blackwell is already sold out through mid-2026 with a 3.6M-unit backlog. Data center GPU lead times run 36-52 weeks.
Jensen Huang at GTC 2026 last week said he sees $1 trillion in orders — double from a year ago. His words: "chips are more of a limiter than even power." Smaller companies and neoclouds face months-long wait lists and severely limited allocations. The demand for compute isn't just growing. It's outrunning supply. Compute isn't just a cost problem. It's a supply problem. The flywheel only spins as fast as the chips come off the line.
Models Depreciate, Infrastructure Appreciates
GPT-4 was the undisputed frontier when it launched in March 2023. By July 2024, open-source Llama 3-405B had surpassed it on MMLU — within 18 months, an open-weight model running on commodity hardware beat the model that defined an era. By early 2026, GPT-4-level capability is available at roughly $0.50/MTok — a 60x reduction. The model that defined an era became a commodity in under three years.
This isn't unique to GPT-4. It's the structural reality of the model layer. Models have a half-life of roughly 12-18 months. Research confirms the pace: equivalent-capability inference costs decline 5-10x per year, with fully commoditized performance tiers dropping 40-900x (ArXiv 2511.23455). The frontier model of today is the baseline of tomorrow and the legacy system of next year. Training costs are enormous — GPT-4's training run cost an estimated $100 million or more (Sam Altman disclosed "more than $100 million"; Epoch AI estimates ~$78M in pure compute) — but the output depreciates like a new car driving off the lot.
Infrastructure doesn't work this way.
A data center built today will run tomorrow's models. The power contracts, cooling systems, networking fabric, and physical space don't become obsolete when a new architecture paper drops on arXiv. An A100 cluster that trained GPT-4 now runs inference for dozens of smaller, more efficient models simultaneously. The silicon depreciates, sure — but the infrastructure around it holds value across generations.
And the GPU itself is only part of the picture. According to Leopold Aschenbrenner's analysis, GPUs account for roughly 40% of total cluster cost — the remaining 60% is networking, power, cooling, and facilities. A 5-year TCO model shows that 100 H100s cost $3M to purchase but $8.6M in total ownership — the GPUs are just 35% of the real cost. The durable infrastructure surrounding the chips holds value far longer than the silicon itself.
This creates a value hierarchy that most people in AI haven't internalized:
Durable layer: Infrastructure (compute) + Relationships (cognitive accumulation). Data centers, power, networking, chip manufacturing. On the other side: the accumulated understanding an AI builds of a specific person or organization over months of interaction. Both appreciate with time and use.
Commodity layer: Models, code, SaaS. Anything that can be replicated, distilled, or rebuilt from scratch. Models get matched by cheaper alternatives. Code gets regenerated by agents. SaaS gets replaced by custom solutions. This entire layer is being compressed toward marginal cost.
Betting on a specific model is like betting on a specific website in 1998. Some will be huge. Most will be replaced. All of them will depend on the underlying infrastructure to exist.
Betting on compute infrastructure is betting on the internet backbone. It doesn't matter which website wins. They all need bandwidth.
This is why inference spending has already surpassed training spending. According to Deloitte's TMT 2026 forecast, inference reached 50% of total compute workload in 2025 (matching training), and will rise to two-thirds by 2026. The inference-optimized chip market is projected to grow from $20B to $50B+. Global AI data center capex is expected to hit $400-450B in 2026. Training is periodic. Inference is forever. Every user, every agent session, every always-on companion burns inference continuously. The infrastructure that serves inference is the load-bearing layer of the entire AI economy.
What the Chip Wars Are Really About
I've written a dedicated deep dive on this, so I'll keep this to what most analysis misses: the chip competition isn't a market battle. It's a philosophical disagreement about the nature of intelligence itself.
GPU (Nvidia): Intelligence is divergent. We don't know where AI architectures are heading. Transformers dominate today, but the field moves fast. The safe bet is flexibility — hardware that can run whatever comes next, even if it's suboptimal for what exists now. Nvidia's real moat isn't the silicon. It's CUDA — the ecosystem, the tooling, the muscle memory of every ML engineer who learned on it. Switching costs aren't just technical. They're cultural. This is a bet that the future is unpredictable, and flexibility is the only rational strategy.
TPU (Google): Intelligence is convergent. Matrix multiplication is the mathematical core of neural networks. Transformers are not a passing fad — they're a fundamental computational pattern. If you believe that, then optimizing ruthlessly for matrix operations is the right move. Google has a structural advantage here: they bet early that matrix operations would be the core of ML. TPU v1 was built in 2016 — before the June 2017 "Attention Is All You Need" paper even existed. When Transformers emerged and came to dominate ML, it validated Google's bet — a fortunate alignment of vision and timing, not insider knowledge. System-level design — chip, interconnect, compiler, software stack — all co-optimized as one unit. When it works, the efficiency advantage is real. Anthropic's cost reductions after moving inference to TPU proved the economics.
ASIC (Groq and others): Intelligence is settled. The compute pattern is known. The architecture is stable. The remaining problem is pure engineering — make the known operation faster, cheaper, more power-efficient. This is the highest-conviction bet. If correct, the returns are enormous. If wrong — if a new paradigm shifts the workload even modestly — the hardware becomes expensive scrap.
Each chip architecture encodes a worldview about how much we know about intelligence.
My read: GPU wins for now, because the landscape is still shifting fast enough that flexibility has enormous option value. TPU has a real path if transformers persist as the dominant paradigm for another 3-5 years, which seems likely. ASIC is the riskiest bet — it assumes workload stability, and though the current paradigm has been stable longer than most people realize — nearly a decade since the June 2017 "Attention Is All You Need" paper.
But here's what matters for this essay: regardless of who wins the chip war, the outcome is the same. Compute gets cheaper. The competition itself is the mechanism that drives the cost curve down. GPU, TPU, and ASIC designers are all racing to deliver more intelligence per dollar per watt. The philosophical disagreement about how to get there doesn't change where they're all going.
One Thread Through Everything
When I look back at everything I've written over the past two months, a single thread connects all of it.
When Software Becomes Disposable — disposable code is cheap code generation, which is cheap inference, which is cheap compute.
The Chat Ceiling — the ceiling breaks partly through new interaction modalities (wearables, ambient sensing), partly through always-on multi-modal inference dropping below the cost threshold for ambient presence. A compute problem on both fronts.
The Printing Press Moment — non-coders building software through iterative conversation. Each iteration is an inference call. Cheap iterations mean cheap compute.
The Super Individual — one person with ten parallel agents. Ten inference streams. Affordable only when compute is cheap enough.
The Amara Paradox — short-term overestimation happens because people extrapolate from compute costs that haven't arrived yet. Long-term underestimation happens because they can't imagine what becomes possible when pure AI catches up to human+AI collaboration in more and more domains. Both errors stem from misunderstanding the compute curve.
The Lobster Fever — narrative amplification of cheap compute's promise. The story runs ahead of the reality, but the underlying engine — compute getting cheaper — is real. The fever breaks when the cost curve delivers what the narrative promised.
TPU vs GPU — literally a deep dive on different approaches to making compute cheaper.
Every post is a different face of the same polyhedron. Rotate it and you see agents, companions, disposable software, super individuals, chip wars, market manias. But the shape underneath is always the same: compute cost.
This isn't a retrospective pattern I'm imposing. It's the framework I actually use to evaluate opportunities. When someone pitches me an AI product idea, my first question isn't about the model or the market. It's: at what compute price point does this become viable? If the answer is "current prices," it should already exist — and if it doesn't, something else is wrong. If the answer is "one more order of magnitude," it's probably 18-24 months away. If the answer is "two more orders of magnitude," it's interesting but premature.
Understand the compute cost curve and you can predict which AI applications become viable and when. Not which specific company will win — that depends on execution, distribution, timing, taste. But which categories of application become possible? That's largely a function of cost — with the caveat that capability breakthroughs can be genuinely unpredictable. A new architecture, a new training technique, a new scaling insight can unlock categories nobody saw coming. The cost curve is the most reliable predictor, but it's not the only variable.
Why Both Layers
This is why I work on both infrastructure and applications. Not because they're complementary businesses. Because they're complementary perspectives on the same underlying system.
Building products shows me what's possible today — where the cost constraints actually bite, which features die on the spreadsheet, which product categories are one price drop away from exploding. Every product I ship is a data point about the current boundary of affordability.
Working on GPU infrastructure shows me where costs are heading — chip roadmaps, power economics, data center financing, the rate at which supply is catching up to demand. Every infrastructure deal is a signal about the future boundary.
One perspective without the other is incomplete. Infrastructure people who don't build products overestimate demand timelines — they see the supply side clearly but guess at what applications will actually materialize. Product people who don't understand infrastructure underestimate what's coming — they design around today's constraints without seeing how fast those constraints are evaporating.
Two layers, one picture.
For the next 3-5 years, I think this means: compute costs will continue dropping at roughly an order of magnitude every 18-24 months, while models continue getting smarter with each generation of training compute. Each drop — in cost and each leap in capability — will unlock a wave of applications that were previously uneconomical or technically impossible. Though GPU supply constraints may throttle the pace: when big tech is absorbing thousands of GPUs per order and smaller players can barely get allocations, the compute flywheel has a governor on it. The pattern will look like periodic "breakthroughs" from the outside, but they're actually just cost thresholds being crossed. Agent workloads become universal. Personalized AI becomes a utility. The gap between "what AI can do" and "what's affordable to deploy" will narrow until it's effectively zero for most use cases.
What I don't know is what happens after that. When compute is effectively free — when the cost of intelligence approaches zero — what reorganizes? What new scarcity emerges? I suspect it's something like taste, judgment, and the ability to know what's worth building in a world where building is trivially cheap. But I'm not confident enough to call it.
The one thing I'm confident about is the root variable. Everything traces back to compute — not just its cost, but its capability, its efficiency, and its availability. If you understand only one thing about where AI is going, understand the compute curve. Everything else is downstream.