ENZH

Mio Unit Economics: Why Every Tier Is Profitable

The Starting Point

When I built the early prototype on OpenClaw, the bill came to an absurd amount in two weeks for a single user — the early version's daily cost was absurdly high. That number forced me to treat cost as a first-class engineering problem — not something to optimize later, but something to solve from day one.

Eight versions later, Mio's real production costs tell a very different story.


Real Production Data

From a live production day (77 interactions, ~28 chat messages), the cost breakdown by category looks like this:

CategoryRelative Cost Share
Chat (LLM)~59% — the dominant cost driver
Personality extraction~21% — expensive per call, infrequent
Memory summary~10% — moderate cost, infrequent
TTS (voice)~3% — cheap per call
Memory extraction~3% — cheap per call
Proactive messages~2% — minimal
Memory rerank~2% — negligible per call
Embedding<1% — essentially free

The total daily cost for an active user came out to remarkably low across all operations. Chat is the biggest line item (nearly 60%), followed by personality extraction and memory tasks. Everything else — voice, embeddings, reranking — is rounding error.

Over 1,000+ interactions across all personas, the per-interaction cost averaged out to nearly negligible per message. That's a reduction of two orders of magnitude from the original prototype — but still higher than where it needs to be for the lower pricing tiers.


Cost Structure Breakdown

The cost has two components:

Fixed daily overhead (modest monthly share):

  • Personality extraction: the biggest fixed cost (3 calls/day, uses Gemini 3.1 Pro)
  • Memory summary: moderate cost (2 calls/day)
  • Memory extraction/embedding: negligible
  • Proactive messages: negligible, variable

Per-message cost (negligible per message):

  • Chat (LLM): the vast majority (8K-17K input tokens per turn)
  • Memory rerank: negligible
  • At 30 msgs/day: scales to a modest monthly bill
  • At 100 msgs/day: the variable cost starts to dominate
  • At 200-300 msgs/day: variable cost is several times the fixed overhead

Media costs (additive, only when used):

  • Voice TTS: a fraction of a cent per call
  • Vision (image understanding): a fraction of a cent per call
  • Video understanding: slightly more expensive per call
  • Selfie generation: negligible per call

The Prompt Compression Effect

The numbers above are pre-compression. v0.1.4 reduced system prompts by ~60% (9K-13K → 3K-5K tokens). Since the system prompt is the largest chunk of input tokens per chat call, this directly reduces per-message cost.

Post-compression estimates (conservative):

  • Per-message cost drops by ~35%
  • Fixed daily overhead drops by ~25%

The net effect: at every usage level, monthly costs drop significantly. The compounding matters most at high usage tiers where per-message cost dominates.


Tier Economics

Every paid tier has a daily message cap that bounds worst-case cost. No unlimited tiers — predictable unit economics at every level.

Pre-compression (current):

TierMsg CapMargin at Max Usage
Free20/dayAcquisition funnel (cost center)
Starter30/dayNegative — underwater at max usage
Pro100/dayRoughly breakeven
Max200/dayModestly profitable
Ultimate300/dayModestly profitable

Post-compression (v0.1.4+):

TierMsg CapMargin at Max Usage
Free20/dayAcquisition funnel (cost center)
Starter30/dayPositive — comfortably in the black
Pro100/dayHealthy margins
Max200/dayStrong margins
Ultimate300/dayStrong margins

The honest picture: at pre-compression costs, only the top two tiers are profitable at max usage. Post-compression changes this — every paid tier becomes profitable, and margins improve significantly at higher tiers.

Important context: "max usage" means a user hitting their message cap every single day for a month. Real-world usage patterns average ~40-60% of cap, which means actual margins are substantially better than the worst-case numbers above.

Why margins are progressive: Lower tiers pay for features that actually cost money to deliver (LLM chat, voice, vision). Higher tiers pay premium prices for features with near-zero marginal cost — selfie generation is negligible, priority processing costs nothing (just queue ordering), NSFW content unlocking costs nothing (just a prompt flag), extended memory is negligible.


Why It Only Gets Better

Three forces are driving costs down simultaneously:

1. Prompt engineering compounds. The v0.1.4 compression cut 60% of system prompt tokens. Future lorebook architecture (injecting backstory on-demand instead of always-on) could cut another 30-40%. Each optimization applies to every message from every user.

2. Model costs are falling fast. LLM inference costs have dropped two orders of magnitude in the past two years. Today's per-message cost will likely drop by another 3-5x within a year as Gemini pricing continues to fall and cheaper models become more capable.

3. Architecture-level optimizations compound. Mio's intelligent model routing already sends 90% of conversations to Gemini 3 Flash and reserves expensive models (Gemini 3.1 Pro) for high-value operations. As cheaper models improve, personality extraction and memory summary can be downgraded — each switch multiplies savings across every user.

The implication: today's post-compression margins are the floor, not the ceiling. Within 6-12 months, the combination of prompt optimization, falling model prices, and architecture improvements should push all tiers to 50-70%+ margins.


The Comparison

MetricEarly PrototypeMio (pre-compress)Mio (post-compress)Mio (projected 12mo)
Cost per user per dayAbsurdly highTwo orders of magnitude lessSignificantly lessA fraction of that
Cost per messageAbsurdly highNegligible~35% cheaper stillAnother 3-5x drop
Profitable at entry tier?NoTop tiers onlyYes (comfortable margin)Yes (strong margin)
Memory managementNoneMulti-layer retrieval+ compressed prompts+ self-optimizing
Emotional nuanceRule-basedSoul-driven+ relationship evolution+ fine-tuned models

From absurd prototype costs to a fraction of subscription revenue per user — and prompt compression drops it further. The trajectory is clear.


This is the technical appendix to the Mio Manifesto. For the vision and product story, start there.


© Xingfan Xia 2024 - 2026 · CC BY-NC 4.0