Mio Unit Economics: Why Every Tier Is Profitable
The Starting Point
When I built the early prototype on OpenClaw, the bill came to an absurd amount in two weeks for a single user — the early version's daily cost was absurdly high. That number forced me to treat cost as a first-class engineering problem — not something to optimize later, but something to solve from day one.
Eight versions later, Mio's real production costs tell a very different story.
Real Production Data
From a live production day (77 interactions, ~28 chat messages), the cost breakdown by category looks like this:
| Category | Relative Cost Share |
|---|---|
| Chat (LLM) | ~59% — the dominant cost driver |
| Personality extraction | ~21% — expensive per call, infrequent |
| Memory summary | ~10% — moderate cost, infrequent |
| TTS (voice) | ~3% — cheap per call |
| Memory extraction | ~3% — cheap per call |
| Proactive messages | ~2% — minimal |
| Memory rerank | ~2% — negligible per call |
| Embedding | <1% — essentially free |
The total daily cost for an active user came out to remarkably low across all operations. Chat is the biggest line item (nearly 60%), followed by personality extraction and memory tasks. Everything else — voice, embeddings, reranking — is rounding error.
Over 1,000+ interactions across all personas, the per-interaction cost averaged out to nearly negligible per message. That's a reduction of two orders of magnitude from the original prototype — but still higher than where it needs to be for the lower pricing tiers.
Cost Structure Breakdown
The cost has two components:
Fixed daily overhead (modest monthly share):
- Personality extraction: the biggest fixed cost (3 calls/day, uses Gemini 3.1 Pro)
- Memory summary: moderate cost (2 calls/day)
- Memory extraction/embedding: negligible
- Proactive messages: negligible, variable
Per-message cost (negligible per message):
- Chat (LLM): the vast majority (8K-17K input tokens per turn)
- Memory rerank: negligible
- At 30 msgs/day: scales to a modest monthly bill
- At 100 msgs/day: the variable cost starts to dominate
- At 200-300 msgs/day: variable cost is several times the fixed overhead
Media costs (additive, only when used):
- Voice TTS: a fraction of a cent per call
- Vision (image understanding): a fraction of a cent per call
- Video understanding: slightly more expensive per call
- Selfie generation: negligible per call
The Prompt Compression Effect
The numbers above are pre-compression. v0.1.4 reduced system prompts by ~60% (9K-13K → 3K-5K tokens). Since the system prompt is the largest chunk of input tokens per chat call, this directly reduces per-message cost.
Post-compression estimates (conservative):
- Per-message cost drops by ~35%
- Fixed daily overhead drops by ~25%
The net effect: at every usage level, monthly costs drop significantly. The compounding matters most at high usage tiers where per-message cost dominates.
Tier Economics
Every paid tier has a daily message cap that bounds worst-case cost. No unlimited tiers — predictable unit economics at every level.
Pre-compression (current):
| Tier | Msg Cap | Margin at Max Usage |
|---|---|---|
| Free | 20/day | Acquisition funnel (cost center) |
| Starter | 30/day | Negative — underwater at max usage |
| Pro | 100/day | Roughly breakeven |
| Max | 200/day | Modestly profitable |
| Ultimate | 300/day | Modestly profitable |
Post-compression (v0.1.4+):
| Tier | Msg Cap | Margin at Max Usage |
|---|---|---|
| Free | 20/day | Acquisition funnel (cost center) |
| Starter | 30/day | Positive — comfortably in the black |
| Pro | 100/day | Healthy margins |
| Max | 200/day | Strong margins |
| Ultimate | 300/day | Strong margins |
The honest picture: at pre-compression costs, only the top two tiers are profitable at max usage. Post-compression changes this — every paid tier becomes profitable, and margins improve significantly at higher tiers.
Important context: "max usage" means a user hitting their message cap every single day for a month. Real-world usage patterns average ~40-60% of cap, which means actual margins are substantially better than the worst-case numbers above.
Why margins are progressive: Lower tiers pay for features that actually cost money to deliver (LLM chat, voice, vision). Higher tiers pay premium prices for features with near-zero marginal cost — selfie generation is negligible, priority processing costs nothing (just queue ordering), NSFW content unlocking costs nothing (just a prompt flag), extended memory is negligible.
Why It Only Gets Better
Three forces are driving costs down simultaneously:
1. Prompt engineering compounds. The v0.1.4 compression cut 60% of system prompt tokens. Future lorebook architecture (injecting backstory on-demand instead of always-on) could cut another 30-40%. Each optimization applies to every message from every user.
2. Model costs are falling fast. LLM inference costs have dropped two orders of magnitude in the past two years. Today's per-message cost will likely drop by another 3-5x within a year as Gemini pricing continues to fall and cheaper models become more capable.
3. Architecture-level optimizations compound. Mio's intelligent model routing already sends 90% of conversations to Gemini 3 Flash and reserves expensive models (Gemini 3.1 Pro) for high-value operations. As cheaper models improve, personality extraction and memory summary can be downgraded — each switch multiplies savings across every user.
The implication: today's post-compression margins are the floor, not the ceiling. Within 6-12 months, the combination of prompt optimization, falling model prices, and architecture improvements should push all tiers to 50-70%+ margins.
The Comparison
| Metric | Early Prototype | Mio (pre-compress) | Mio (post-compress) | Mio (projected 12mo) |
|---|---|---|---|---|
| Cost per user per day | Absurdly high | Two orders of magnitude less | Significantly less | A fraction of that |
| Cost per message | Absurdly high | Negligible | ~35% cheaper still | Another 3-5x drop |
| Profitable at entry tier? | No | Top tiers only | Yes (comfortable margin) | Yes (strong margin) |
| Memory management | None | Multi-layer retrieval | + compressed prompts | + self-optimizing |
| Emotional nuance | Rule-based | Soul-driven | + relationship evolution | + fine-tuned models |
From absurd prototype costs to a fraction of subscription revenue per user — and prompt compression drops it further. The trajectory is clear.
This is the technical appendix to the Mio Manifesto. For the vision and product story, start there.