Technical Autopsy: What Survives v1
Not a Rewrite — A Transplant
In Part 1, I explained why Mio v1's persona-heavy direction had to go. In Part 2, I laid out the new product vision: a companion with personality but no physical identity — Her, not Character.AI.
Now the engineering question: how much of v1 do we actually throw away?
The answer surprised me. When I sat down and traced every import path in packages/core, most modules didn't care about personas at all. They accepted parameters. They returned results. The caller happened to pass persona data, but the functions themselves were persona-agnostic.
The real number: about 60-70% of Mio's core infrastructure carries over directly. Another 15% needs modification. The remaining 15-20% gets dropped entirely. This is a transplant, not a rewrite.
The Decision Framework
Every module in v1 went through three questions:
- Does it depend on preset files? If it reads identity definitions, personality configs, or behavior rules directly, it dies.
- Does it assume multi-agent architecture? If it routes between multiple personas or sessions, it dies.
- Is the core logic useful in a one-companion-per-user world? If yes, it lives.
This sounds simple, but it took a full audit of the codebase to answer confidently. Some modules looked coupled but weren't. Others looked clean but had hidden assumptions.
What Lives: Direct Copy (~60-70%)
These modules are pure infrastructure. They don't know or care what kind of companion is using them.
Memory system — the crown jewel. packages/core/memory/ handles memory extraction, embedding generation, semantic search via pgvector, importance scoring, and decay. Zero preset coupling. It takes a user ID and conversation content, extracts memories, stores them with vector embeddings, and retrieves relevant ones on demand. This was the single most expensive engineering investment in v1, and every line carries over.
I wrote about the memory rebuild in Part 3 of the v1 series. The architecture hasn't changed: memories are extracted after each conversation, embedded with a 768-dimensional vector, and retrieved via cosine similarity during context building. What changes in v2 is that memories are user-bound rather than agent-bound — but that's a column rename, not an architecture change.
Media pipeline — TTS, vision, transcription, URL browsing. Each of these is a standalone module in core/media/:
tts.ts— takes text + voice ID, returns audio. Doesn't care who's speaking.vision.ts— takes an image, returns a description. Doesn't care who's looking.transcribe.ts— takes audio, returns text. Doesn't care who's listening.browse.ts— takes a URL, returns page content. Doesn't care who's reading.
I built these as documented in Part 6 — Media and Part 8 — URL Browsing. The whole media pipeline was designed as pure functions from day one. That decision pays off now.
Subscription system — tier definitions, usage tracking, feature gating. Completely independent of the companion layer. core/subscription/ knows about free/starter/pro/max tiers and daily limits. It doesn't know what a persona is.
Cost tracking — core/cost/ records every LLM call, TTS synthesis, and vision analysis with token counts and USD cost. Pure accounting logic. The unit economics post was built on data from this system — it survives unchanged.
Models config — models.ts defines available LLM models, their pricing, context windows, and routing rules. No persona dependency. The model routing architecture carries over as-is.
System-prompt builder — agent/system-prompt.ts is a pure function. It takes personality description, emotion state, relevant memories, and user context as parameters, then assembles a system prompt string. The caller used to pass persona data from identity definition files. Now the caller passes a user-customized personality description. The function doesn't change — only what gets passed into it.
What Needs Surgery (~15%)
These modules are good but carry v1 assumptions that need to be cut out.
Emotion engine (soul/emotion.ts) — the core emotion model is solid: it tracks valence, arousal, and a set of discrete emotions, updating after each interaction. But in v1, the emotion engine was entangled with the schedule system. Mio would feel "tired" at night because the schedule said so, not because of conversation dynamics. The fix: rip out every schedule reference and let emotions be driven purely by conversation and time-of-day awareness. The emotion model stays. The fake-life triggers go.
Proactive messaging (soul/proactive.ts) — the proactive system decides when and why to send unprompted messages. In v1, it pulled from proactive.json files that contained persona-specific triggers like "just got off work" or "heading to yoga." The new proactive engine is simpler and more honest:
- Time-aware: "It's late, how was your day?" (knows the clock, doesn't pretend to have a schedule)
- Memory-driven: "You mentioned that interview — how did it go?" (pulls from stored memories)
- Emotion-continuing: "You seemed down yesterday, feeling better?" (reads last emotion state)
- Simple care: "Haven't talked in a while, thinking of you" (gap detection)
The scheduling infrastructure stays. The fake-life content generation gets replaced.
Context aggregator (context/aggregator.ts) — the orchestration layer that pulls together memories, recent messages, emotion state, and user context before calling the LLM. In v1, it also pulled agent-specific backstory and relationship dynamics. The simplification: remove multi-agent routing, remove relationship type lookups, keep the memory retrieval and context assembly.
What Dies (~15-20%)
No eulogy. These served their purpose in a persona-driven world that no longer exists.
All preset files — every identity definition, personality config, and behavior rules file across five persona directories. These defined who "Xiaomeng the Chengdu barista" was, what "Xuejie the graduate student" liked, how "Dashu the middle-aged uncle" talked. Hundreds of lines of carefully crafted backstory. Gone. In v2, personality emerges from conversation, not from files.
Reference images — each persona had reference photos for selfie generation. A companion that doesn't pretend to be human doesn't need a face to fake.
Schedule system — media/schedule-*.ts simulated daily routines. Mio would be "at work" from 9 to 6, "at the gym" in the evening, "sleeping" after midnight. This was the hardest part of v1 to get right and the least valuable for retention. Users don't bond with a companion because it pretends to go to yoga. They bond because it remembers what they said.
Persona-style modules — media/persona-style.ts and relationship-dynamics.ts shaped responses based on predefined relationship types (friend, romantic partner, confidant). In v2, the relationship is whatever emerges naturally.
Relationship evolution — relationship/evolution-*.ts tracked relationship stages with explicit progression logic. Over-engineered for what turned out to be a simple truth: if the companion remembers you and responds with warmth, the relationship deepens on its own. You don't need state machines for that.
The Database: From 10 Tables to 4
This is where the architectural simplification is most visible. Here's the before and after.
v1 Schema (~10 tables)
users — complex, with agent associations
agents — multiple per user, preset bindings, customStory, relationshipType
sessions — multi-agent, multi-channel routing
messages — session-bound
memories — agent-bound
token_transactions — unchanged
channel_bindings — Telegram/web channel routing per agent
onboarding_states — multi-step onboarding state machine
telegram_allowlist — access control for Telegram bot
account_link_tokens — cross-platform account linking
v2 Schema (4 core tables + 1 supporting)
-- Users: simplified, no agent associations
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email TEXT UNIQUE,
timezone TEXT DEFAULT 'Asia/Shanghai',
subscription_tier TEXT DEFAULT 'free',
trial_expires_at TIMESTAMPTZ,
daily_usage JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
-- Companions: one per user, that's it
CREATE TABLE companions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID UNIQUE REFERENCES users(id), -- UNIQUE enforces one-to-one
name TEXT NOT NULL,
voice_id TEXT,
personality TEXT, -- LLM-generated, a few sentences
emotion_state JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
-- Messages: flat, user-bound, no session concept
CREATE TABLE messages (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID REFERENCES users(id),
role TEXT NOT NULL, -- 'user' | 'assistant'
content TEXT,
media_urls TEXT[],
emotion_state JSONB, -- snapshot at time of response
created_at TIMESTAMPTZ DEFAULT now()
);
-- Memories: virtually unchanged from v1
CREATE TABLE memories (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID REFERENCES users(id),
type TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(768), -- pgvector
importance REAL DEFAULT 0.5,
access_count INT DEFAULT 0,
last_accessed TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT now()
);
-- Token transactions: unchanged from v1
CREATE TABLE token_transactions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID REFERENCES users(id),
operation_type TEXT NOT NULL,
model_id TEXT,
input_tokens INT DEFAULT 0,
output_tokens INT DEFAULT 0,
cost_usd NUMERIC(10,6) DEFAULT 0,
created_at TIMESTAMPTZ DEFAULT now()
);
The key constraint is UNIQUE on companions.user_id. One user, one companion. This single constraint eliminates the need for agent selection, session routing, and channel binding. The entire multi-agent infrastructure collapses into a foreign key.
What Got Eliminated and Why
| Eliminated Table | Why It Existed in v1 | Why It's Gone in v2 |
|---|---|---|
agents | Multiple personas per user | One companion per user, stored in companions |
sessions | Multi-agent, multi-channel routing | No sessions — messages are a flat stream per user |
channel_bindings | Telegram + web channel routing per agent | Single-platform native app |
onboarding_states | Multi-step state machine with branching | Conversational onboarding, state lives in the chat |
telegram_allowlist | Access control for the Telegram bot | No Telegram |
account_link_tokens | Cross-platform account linking tokens | Single auth system |
Messages moved from session-bound to user-bound. In v1, a message belonged to a session, which belonged to an agent, which belonged to a user. Three joins to answer "what did this user say?" In v2, messages have a user_id. One join. Done.
Memories moved from agent-bound to user-bound. Same simplification — your memories belong to you, not to a specific persona instance.
WebSocket Replaces SSE
v1 used Server-Sent Events for streaming LLM responses to the client. SSE works, but it's one-directional: server pushes to client. The client has to use separate HTTP requests to send messages.
v2 uses WebSocket. The reasons are concrete:
Bidirectional by nature. The roadmap ends at real-time voice chat. SSE can't do bidirectional audio streaming. WebSocket can. Rather than build on SSE now and rip it out later, start with WebSocket.
Proactive messaging is cleaner. In v1, proactive messages required the client to maintain a polling connection or a separate SSE channel. With WebSocket, the server pushes a proactive message the same way it pushes a chat response — same channel, same protocol.
Expo support. React Native's WebSocket support is mature and well-documented. SSE support in React Native requires polyfills and has edge cases around reconnection. WebSocket reconnection is a solved problem with libraries like reconnecting-websocket.
Heartbeat built in. WebSocket has native ping/pong frames for connection health. SSE relies on application-level keepalive, which is more fragile on mobile networks.
The WebSocket protocol is simple:
Client → Server: { type: "message", text, mediaIds? }
Server → Client: { type: "token", text } // streaming
Server → Client: { type: "done", messageId, emotionState }
Server → Client: { type: "voice", audio } // base64
Server → Client: { type: "proactive", text, emotionState }
Server → Client: { type: "emotion", state } // orb update
Six message types. The entire real-time communication layer fits in one WebSocket connection per user.
The Telegram Kill
This one hurt. Telegram was Mio v1's primary channel. It's where users actually talked to their companions. Killing it wasn't about Telegram being bad — it was about the new architecture making it pointless.
The fatal problem: chat history doesn't sync to Telegram's UI. If a user spends weeks talking to their companion in the native app, then opens Telegram, they see an empty chat. The companion knows everything, but the conversation history is invisible. That's not a "lite version" — that's a broken experience.
And the onboarding problem compounds it. v2's onboarding is conversational — the companion is born through dialogue. You can't do that in Telegram and then expect the user to switch to the app. If they have to download the app for onboarding anyway, why would they go back to Telegram?
The decision: no Telegram in v0-v1. If cross-platform presence matters later, an Apple Watch widget or Android widget makes more sense than a chat bot in someone else's platform.
The Stack
Putting it all together:
| Layer | v1 | v2 |
|---|---|---|
| Frontend | Next.js web + Telegram bot | Expo / React Native |
| Animations | CSS | React Native Skia |
| Server | Hono | Hono (kept) |
| Real-time | SSE | WebSocket |
| Database | Supabase + Drizzle | Supabase + Drizzle (kept) |
| Vector search | pgvector | pgvector (kept) |
| ORM | Drizzle | Drizzle (kept) |
Most of the server-side stack is unchanged. Hono stays because it's lightweight and works well. Supabase + Drizzle stays because the database layer is proven. pgvector stays because the memory system depends on it.
The big change is the client: from a web app built on Next.js to a native app built on Expo. The WeChat-style redesign was good engineering work, but the entire UI paradigm is being replaced. None of those 30+ components carry over. The new client is a single chat screen with an animated orb — closer to GPT's voice mode than to WeChat.
What This Means in Practice
The first commit of mio-v2 won't start from zero. It'll start from a packages/core directory that already contains:
- A working memory system with pgvector semantic search
- A complete media pipeline (TTS, STT, vision, browsing)
- A subscription and billing system
- A cost tracking system
- A model configuration system
- A system-prompt builder that takes parameters
That's a lot of infrastructure to not re-build. The new work is:
- A new database schema (4 tables, written above)
- A new WebSocket server (replacing SSE endpoints)
- A new Expo client (chat screen + animated orb)
- Conversational onboarding (personality emerges from dialogue)
- Modified emotion engine (no schedule, pure conversation-driven)
- Modified proactive messaging (no fake-life triggers)
The hardest part isn't the code. It's resisting the urge to rebuild what already works.
What's Next
The schema is designed. The module inventory is done. The keep/modify/drop decisions are made. Next is actually building it — standing up the Expo app, wiring the WebSocket, and watching the orb pulse for the first time.
But that's a Part 4 story.