Mio Unit Economics: Why Every Tier Is Profitable

The Starting Point

When I built the early prototype on OpenClaw, the bill came to an absurd amount in two weeks for a single user — the early version's daily cost was absurdly high. That number forced me to treat cost as a first-class engineering problem — not something to optimize later, but something to solve from day one.

Eight versions later, Mio's real production costs tell a very different story.

Real Production Data

From a live production day (77 interactions, ~28 chat messages), the cost breakdown by category looks like this:

Category	Relative Cost Share
Chat (LLM)	~59% — the dominant cost driver
Personality extraction	~21% — expensive per call, infrequent
Memory summary	~10% — moderate cost, infrequent
TTS (voice)	~3% — cheap per call
Memory extraction	~3% — cheap per call
Proactive messages	~2% — minimal
Memory rerank	~2% — negligible per call
Embedding	<1% — essentially free

The total daily cost for an active user came out to remarkably low across all operations. Chat is the biggest line item (nearly 60%), followed by personality extraction and memory tasks. Everything else — voice, embeddings, reranking — is rounding error.

Over 1,000+ interactions across all personas, the per-interaction cost averaged out to nearly negligible per message. That's a reduction of two orders of magnitude from the original prototype — but still higher than where it needs to be for the lower pricing tiers.

Cost Structure Breakdown

The cost has two components:

Fixed daily overhead (modest monthly share):

Personality extraction: the biggest fixed cost (3 calls/day, uses Gemini 3.1 Pro)
Memory summary: moderate cost (2 calls/day)
Memory extraction/embedding: negligible
Proactive messages: negligible, variable

Per-message cost (negligible per message):

Chat (LLM): the vast majority (8K-17K input tokens per turn)
Memory rerank: negligible
At 30 msgs/day: scales to a modest monthly bill
At 100 msgs/day: the variable cost starts to dominate
At 200-300 msgs/day: variable cost is several times the fixed overhead

Media costs (additive, only when used):

Voice TTS: a fraction of a cent per call
Vision (image understanding): a fraction of a cent per call
Video understanding: slightly more expensive per call
Selfie generation: negligible per call

The Prompt Compression Effect

The numbers above are pre-compression. v0.1.4 reduced system prompts by ~60% (9K-13K → 3K-5K tokens). Since the system prompt is the largest chunk of input tokens per chat call, this directly reduces per-message cost.

Post-compression estimates (conservative):

Per-message cost drops by ~35%
Fixed daily overhead drops by ~25%

The net effect: at every usage level, monthly costs drop significantly. The compounding matters most at high usage tiers where per-message cost dominates.

Tier Economics

Every paid tier has a daily message cap that bounds worst-case cost. No unlimited tiers — predictable unit economics at every level.

Pre-compression (current):

Tier	Msg Cap	Margin at Max Usage
Free	20/day	Acquisition funnel (cost center)
Starter	30/day	Negative — underwater at max usage
Pro	100/day	Roughly breakeven
Max	200/day	Modestly profitable
Ultimate	300/day	Modestly profitable

Post-compression (v0.1.4+):

Tier	Msg Cap	Margin at Max Usage
Free	20/day	Acquisition funnel (cost center)
Starter	30/day	Positive — comfortably in the black
Pro	100/day	Healthy margins
Max	200/day	Strong margins
Ultimate	300/day	Strong margins

The honest picture: at pre-compression costs, only the top two tiers are profitable at max usage. Post-compression changes this — every paid tier becomes profitable, and margins improve significantly at higher tiers.

Important context: "max usage" means a user hitting their message cap every single day for a month. Real-world usage patterns average ~40-60% of cap, which means actual margins are substantially better than the worst-case numbers above.

Why margins are progressive: Lower tiers pay for features that actually cost money to deliver (LLM chat, voice, vision). Higher tiers pay premium prices for features with near-zero marginal cost — selfie generation is negligible, priority processing costs nothing (just queue ordering), NSFW content unlocking costs nothing (just a prompt flag), extended memory is negligible.

Why It Only Gets Better

Three forces are driving costs down simultaneously:

1. Prompt engineering compounds. The v0.1.4 compression cut 60% of system prompt tokens. Future lorebook architecture (injecting backstory on-demand instead of always-on) could cut another 30-40%. Each optimization applies to every message from every user.

2. Model costs are falling fast. LLM inference costs have dropped two orders of magnitude in the past two years. Today's per-message cost will likely drop by another 3-5x within a year as Gemini pricing continues to fall and cheaper models become more capable.

3. Architecture-level optimizations compound. Mio's intelligent model routing already sends 90% of conversations to Gemini 3 Flash and reserves expensive models (Gemini 3.1 Pro) for high-value operations. As cheaper models improve, personality extraction and memory summary can be downgraded — each switch multiplies savings across every user.

The implication: today's post-compression margins are the floor, not the ceiling. Within 6-12 months, the combination of prompt optimization, falling model prices, and architecture improvements should push all tiers to 50-70%+ margins.

The Comparison

Metric	Early Prototype	Mio (pre-compress)	Mio (post-compress)	Mio (projected 12mo)
Cost per user per day	Absurdly high	Two orders of magnitude less	Significantly less	A fraction of that
Cost per message	Absurdly high	Negligible	~35% cheaper still	Another 3-5x drop
Profitable at entry tier?	No	Top tiers only	Yes (comfortable margin)	Yes (strong margin)
Memory management	None	Multi-layer retrieval	+ compressed prompts	+ self-optimizing
Emotional nuance	Rule-based	Soul-driven	+ relationship evolution	+ fine-tuned models

From absurd prototype costs to a fraction of subscription revenue per user — and prompt compression drops it further. The trajectory is clear.

This is the technical appendix to the Mio Manifesto. For the vision and product story, start there.