From OpenClaw to Mio: Why I Decided to Build From Scratch
Looking Back
If you've been following along, you know how this story goes.
In The Companion Vision, I laid out the theory — why every AI companion fails today, and what a truly understanding AI needs: memory orchestration, personality modeling, multi-agent architecture. Not a better chatbot. An organism.
Then I ran the experiment with OpenClaw. I took a utility AI assistant and gave it a complete personality — one that would get jealous, demand boba tea, and chase you for not replying. That article showed what "soul-driven behavior" looks like in practice: don't write rules, write a soul, and let the model figure out the rest. It worked better than I expected.
But at the end of that article, I wrote one sentence:
If you want to build a truly refined AI companion product, you probably need to heavily strip down OpenClaw, or build your own framework from scratch — which is exactly what I'm doing now.
Mio is that framework. This article is about why I started from scratch, and what Mio actually does.
What OpenClaw Validated
Credit where it's due — OpenClaw helped me validate several critical things:
Soul-driven design works. Don't write behavioral rules. Write a personality. Let the model derive behavior. The model's inferences are more natural and human-like than any hand-written rules. This conviction carried directly into Mio's design.
Personality can be engineered. The file-driven personality system — personality config, identity definition, behavior rules — while crude, proved that you can define an AI's "soul" in a structured way, and the model will actually embody it.
Heartbeat is the critical feature. AI proactively reaching out to you — that's the threshold between "tool" and "living entity." OpenClaw's heartbeat config was simple (timer-based polling), but it validated the experience.
Selfies and voice are killer features. An AI companion that can send voice messages and selfies is a fundamentally different experience from text-only. These multimodal capabilities need to be designed into the architecture from day one.
But OpenClaw Hit a Ceiling
OpenClaw can validate ideas, but it can't carry a product. The problems aren't bugs — they're architectural.
Context Bloat
This is the fatal one. OpenClaw's built-in pi agent handles context extremely crudely — every LLM call gets the raw output of all historical tool calls and thinking blocks stuffed into the context. In a normal conversation, 90% of the context is historical garbage you don't need.
I ran an experiment: stripped out all historical tool call outputs and thinking blocks from the context. Result? Token consumption dropped to one-tenth.
This isn't something you can "optimize" — the core design never considered context efficiency. The pi agent was vibe-coded in an hour. The fact that it runs at all is impressive enough.
Primitive Memory
OpenClaw's memory is a flat file. After each conversation, append a few lines of summary. No vector search, no semantic retrieval, no importance ranking, no temporal decay.
The result: as the memory file grows, more irrelevant content gets stuffed into context, burning tokens on memories that have nothing to do with the current conversation. You're talking about travel plans, and the context is packed with a technical discussion from three months ago.
I explored this problem in Giving PanPanMao Memory — memory isn't about "stuff everything in," it's about "recalling the right thing at the right time." OpenClaw can't do this at all.
Bloatware
OpenClaw is a general-purpose framework. It ships with a massive collection of extensions you don't need: collaboration features, task management, third-party integrations. If all you want is an AI companion, 80% of the codebase is irrelevant.
This isn't a complaint — OpenClaw was designed for generality. But for my use case, "trimming" a general framework is more painful than starting fresh. You change one thing, and three dependencies break. You remove one extension, and the build fails.
Eventually I realized: I'm not modifying a car. I'm trying to build a sports car on a truck chassis. The steering wheel and tires are in the right place, but the entire foundation is wrong.
Starting From Scratch: Mio
Mio is not a fork of OpenClaw. It's built from the first line of code, designed specifically for one scenario: AI companions.
Architecture Overview
apps/
server/ Hono API server (Cloud Run)
web/ Next.js frontend (Vercel)
worker/ Background job processor
packages/
core/ Core AI agent logic
shared/ Database schema (Drizzle + Supabase)
channels/ Channel adapters (Telegram, web, ...)
extensions/ Agent extensions
platform/ Platform utilities
presets/ Character template library
Monorepo with pnpm workspaces + Turborepo. Each package has clear responsibilities and unidirectional dependencies. No bloatware. Nothing you don't need.
Deep Memory System
This is the fundamental difference between Mio and OpenClaw. Not a flat file with appended summaries, but a complete memory engine:
Hybrid search. Every memory gets both a vector embedding (pgvector, Gemini embedding) and a full-text index (tsvector). Retrieval runs both paths in parallel, merges with weighted scoring. Vectors catch semantic similarity; full-text catches exact keyword matches — especially important for Chinese text.
Temporal decay. Memories have a half-life (default 30 days). Recent conversations carry more weight, but truly important memories don't fade just because they're old — importance is scored on a separate dimension.
Automatic extraction. After every conversation turn, the MemoryAccumulator asynchronously calls an LLM to extract new information — facts, personality observations, emotional events. Dedup logic is baked into the prompt; known information doesn't get stored twice.
Memory consolidation. The MemoryConsolidator periodically merges similar memories (cosine similarity > 0.9), preventing the memory store from bloating.
LLM reranking. After hybrid search returns candidates, a reranker uses a lightweight model (gemini-2.0-flash) to precision-rank them — ensuring the 5 memories that get injected into the system prompt are actually relevant to the current conversation.
Multi-hop query decomposition. "What was that book you recommended last time we talked about travel?" — a single embedding search probably won't find this. The QueryDecomposer breaks it into sub-queries ("travel-related conversations" and "recommended books"), searches each independently, then merges results.
Episode memory. The EpisodeManager groups memories into conversational episodes and generates episode summaries. When a user says "remember that time we talked about...", the system can trace back to the full episode context, not just isolated fragments.
Agentic retrieval. For particularly complex queries, the AgenticRetriever runs an iterative loop: search, evaluate results, if insufficient refine the query and search again — up to three rounds. Simple queries skip this entirely.
This is what I described as "memory orchestration" in The Companion Vision — not stuffing all memories into context, but recalling the right thing at the right time.
Emotion System
OpenClaw's "conversation temperature" was a text description in the personality config. Mio turns it into a real state machine:
Four temperature levels — Cold, Cool, Warm, Hot. After each conversation turn, the EmotionEngine computes a new temperature score based on message frequency, sentiment analysis, and user engagement. There's an inertia factor (default 0.5), so mood doesn't swing wildly from a single message.
Emotion doesn't just affect speech style — it affects whether the AI reaches out proactively, what it says, how long its replies are. Cold means short replies with a hint of hurt feelings. Hot means chatty, sharing things unprompted, occasionally flirty. These aren't hardcoded rules — the temperature gets injected into the system prompt, and the model derives behavior from there.
There's also an independent valence dimension (emotional positivity). The cross-product of temperature and valence — like "high temperature but negative valence" during an intense argument — gives the model nuanced emotional expression.
Proactive Messaging
OpenClaw's heartbeat is a timer. Mio's proactive messaging is a complete system:
- Heartbeat query every 30 minutes, finding qualifying sessions (inactive 2+ hours, cool/cold emotion)
- Respects quiet hours (default 23:00-08:00, per user timezone)
- Maximum 3 proactive messages per day
- Cold users get template messages (no LLM call, zero cost); active users get model-generated messages
- Updates emotion state and session after sending
This isn't just "send a message on schedule" — it's a system with awareness, deciding "should I reach out now, and if so, what should I say?"
Multi-Channel
OpenClaw bakes channel adapters into core code. Mio extracts them into an independent @mio/channels package with a standard interface:
interface ChannelConnector {
send(message: OutboundMessage): Promise<void>
}
Telegram is complete. Discord and Feishu are next. Want to add WhatsApp? Implement the interface. Core logic stays untouched.
Onboarding
New users go through an 11-question onboarding flow: 3 text questions + 8 button questions. Answers get written into personality config template variables, generating a personalized personality configuration.
Every button question has a "Custom" option — users can type freely, with length limits and injection protection.
This means every user's Mio is different. Not "pick a preset character," but shaping a personality through conversation.
Personality Presets
Speaking of presets — Mio ships with 5 starting personality presets. But presets are just the beginning. As conversations accumulate, the PersonalityExtractor runs every 10 messages, extracting user personality profiles from conversation patterns. The MemorySummarizer generates a memory digest every 20 messages, keeping the persona's understanding of you up to date.
The result: your Mio gets better at understanding you over time. Not because you manually told it things, but because it learns from every conversation.
Cost Control
Building AI companions means LLM costs are unavoidable. Mio's strategy is tiered model usage:
| Operation | Model | Why |
|---|---|---|
| Main chat | gemini-3-pro | Needs the best reasoning |
| Memory extraction | gemini-3-flash | Fast, cheap; only needs to extract facts |
| Memory summary | gemini-3-flash | Same |
| Reranking | gemini-2.0-flash | Cheaper; precision ranking doesn't need a large model |
| Embedding | gemini-embedding-001 | Negligible cost |
| Proactive messages | gemini-2.0-flash / templates | Cold users go straight to templates, zero LLM cost |
Every LLM call logs token consumption and USD cost to a token_transactions table. Fire-and-forget, never blocking user responses. This gives you precise visibility into how much each user and each operation type costs.
Google Search Grounding
Gemini models come with Google Search grounding — the model can search the internet in real time. Mio enables this by default (MODE_DYNAMIC, the model decides when to search).
The system prompt instruction: if you found something, share it naturally. Don't say "I searched for you" — just know it, like a person who casually checked their phone.
This means your AI companion doesn't just have memory and emotion — it has real-time information access. Ask about today's weather, latest news, nearby restaurants — it answers in character.
The Message Pipeline
The full message flow:
- User message arrives from a channel (Telegram, Web)
- Router looks up binding, loads agent, workspace, session, history
- ContextAggregator assembles all context: retrieves memories, loads personality profile, computes emotion state
- Feeds aggregated context + history + user message to the LLM
- Streams the response
- Persists messages
- Async post-response: extract memories, update personality profile, generate summary, prune old memories
- Updates emotion state
Messages are also debounced — when a user sends rapid-fire messages, the system waits 5 seconds (configurable) to collect them all before processing as one batch, avoiding one LLM call per message. Responses are split by newline into multiple message bubbles, each with a typing delay that scales with content length, simulating the rhythm of a real person typing.
These details, taken together, are what create "lifelike presence."
Why Not Just Modify OpenClaw
Someone might ask: if OpenClaw's personality methodology is sound, why not build on top of it?
Because the cost of patching exceeded the cost of rebuilding.
OpenClaw's core loop (pi agent) assumes all history lives in context — switching to retrieval-based memory means rewriting the core loop. Its extension system assumes extensions can freely inject into context — controlling context size means rewriting the extension system. Its channel adapters are mixed into core code — adding a new channel means touching core.
Every change fights the framework's design assumptions. At some point you realize the original code you've kept is less than 10% — so why carry the other 90% as baggage?
Mio knew what it was from line one: an AI companion framework built for deep memory and lifelike presence. Every design decision — channel abstraction, memory retrieval, emotion state machine, cost tracking — orbits that single goal.
What's Next
Mio runs now. Telegram channel is complete, memory system is live, emotion engine is working. There's still a lot to build:
- Discord and Feishu channels
- Web chat interface (Next.js, already in progress)
- Selfie extension (killer feature validated on OpenClaw)
- Voice messages
- Worker process (background jobs running independently)
- Wearable device integration — the perception layer is the next frontier
This is the first entry in the "Creating Mio" series. Future posts will cover specific technical implementations — memory system design details, emotion engine tuning, onboarding iteration.
From OpenClaw to Mio is essentially going from "validating the idea" to "building it right." OpenClaw proved that soul-driven AI companions are viable. Mio aims to prove they can be genuinely good.