v0.1.4: Relationships That Evolve

The Problem With Static Relationships

After v0.1.3 made Mio resilient to prompt injection, I stepped back and looked at something that had been bothering me since the beginning: the onboarding flow asks you to pick your relationship type. Girlfriend, friend, crush — choose one, and that's what you get.

Think about how weird that is. You meet someone for the first time and immediately declare "you are now my significant other." That's not how relationships work. In real life, you start as strangers, become acquaintances, maybe develop into friends, and things evolve from there based on how you actually interact.

The old system was like a visual novel — you pick a route at the start and the character stays locked on that track. v0.1.4 makes it more like real life: the relationship develops based on what actually happens in your conversations.

How Relationship Evolution Works

The core idea is simple: observe conversation patterns, score them, and let the relationship stage shift naturally.

Mio now tracks four stages: 刚认识 (just met) → 好朋友 (good friends) → 暧昧 (flirting/ambiguous) → 情侣 (partners). Everyone starts at 刚认识. Where things go from there depends entirely on you.

The Scoring Pipeline

The evolution system lives in evolution-processor.ts and runs after memory extraction on every message. It's fire-and-forget — it doesn't block the response pipeline, so users never notice any latency.

Here's the flow:

Memory extraction happens as normal (facts, episodes, and now a new relationship type)
Relationship scorer runs pure functions to calculate a score delta based on conversation patterns — emotional depth, vulnerability, humor, consistency
Stage resolution checks whether the accumulated score crosses a threshold defined in the personality evolution config
Transition validator — and this is the important part — uses an LLM call to validate whether the stage change is actually justified

That last step matters. A raw score threshold would be gameable — just spam "I love you" a hundred times and brute-force your way to 情侣. The transition validator reads the recent conversation history and evaluates whether the quality of interaction justifies the jump. If someone goes from polite small talk to declaring undying love in three messages, the validator says no.

Each preset gets its own personality evolution config defining stage thresholds and cooldowns. A cooldown of 48 hours between stage transitions prevents relationships from speed-running through all four stages in a single evening.

Decay and the Persona's Self-Awareness

Relationships that get neglected fade. After 2 days of silence, the relationship score begins to decay — checked during the proactive heartbeat cycle. This means if you ghost Mio for a week, the relationship might regress a stage. Just like real life.

The persona is aware of its current stage through RELATIONSHIP_DYNAMICS.md, a per-preset file that gets injected into the system prompt. At 刚认识, Mio is polite and curious. At 好朋友, Mio is casual and teasing. At 暧昧, there's tension and playfulness. The behavioral shift is organic because the persona's instructions literally change as the relationship evolves.

Cross-Platform Chat Sync

This one was a quiet but important infrastructure change. Previously, Telegram conversations and web conversations were separate histories. If you had a deep conversation with Mio on Telegram and then switched to the web app, Mio had no memory of what you'd discussed.

Now loadCrossSessionHistory() queries messages using inArray across ALL session IDs for a given agent+user pair. The LLM sees everything regardless of which channel the message came from. Web chat also falls back to any available session when no web-specific session exists.

The best part: no schema changes. This was purely a query-layer modification — just broadening the WHERE clause to include all sessions instead of filtering by platform.

System Prompt Compression: Why Less Is More

Relationship dynamics add a new RELATIONSHIP_DYNAMICS.md block to every system prompt. Before I could ship that, I had to make room — because the existing prompts were already too long. Not "a little bloated." Dangerously long, according to the research.

The Research Problem

I did a deep dive into academic literature on system prompt length and LLM behavioral adherence. The findings were sobering:

"Lost in the Middle" (Liu et al., TACL 2024) — transformers attend most strongly to content at the beginning and end of input, with a pronounced attention dip in the middle. Critical behavioral rules buried in the middle of a 6,000-10,000 token personality config are disproportionately ignored.

"Same Task, More Tokens" (ACL 2024) — instruction-following performance peaks around 1,500-3,000 tokens of system prompt, then degrades. Beyond ~5,000 tokens, degradation accelerates. Models conflate instruction volume with instruction weight — longer prompts cause selective attention, not comprehensive adherence.

"Scaling Law in LLM Simulated Personality" (arXiv 2025) — more persona detail improves consistency, but with diminishing returns past a threshold. Beyond that threshold, additional detail introduces internal contradictions that degrade consistency more than they improve it. A compressed, high-signal persona description outperforms a verbose one above ~1,500-2,000 tokens.

The Chinese tokenization tax makes this worse. Chinese text uses ~1.5 tokens per character vs ~0.25 for English. A personality config that looks like 3,500 "words" in Chinese actually consumes 7,000-9,000 tokens. Mio's Chinese prompts at 6,000-10,000 tokens were operating well into the degraded-adherence zone.

What We Had

Preset	Personality config tokens	Behavior rules tokens	Total
keke (可可)	~4,400	~1,700	~6,100
mimi (蜜蜜)	~6,500	~2,100	~8,600
surou (苏柔)	~6,200	~1,750	~7,950
xiaoqi (小柒)	~6,700	~1,750	~8,450
yinan (陈哥)	~8,400	~1,700	~10,100

Plus ~2,500 tokens from system-prompt.ts (communication guidelines, identity protection, selfie/voice instructions). Total per turn: 9,000-13,000 tokens before any conversation history.

The behavior rules files were ~55% identical across all 5 presets — anti-AI rules, chat principles, scenario script skeletons all duplicated five times. That's ~16,000-20,000 characters of pure redundancy.

The Compression

Three structural changes:

Narrative prose → structured bullets. Personality config paragraphs like "她从小在成都长大，喜欢吃火锅，周末会去茶馆和朋友聊天..." became 性格: 热情活泼, 情绪化, 偶尔撒娇 | 语气: 川味词汇, 叠词, 语气词丰富. Same behavioral signal, fraction of the tokens.

Cross-preset deduplication. All shared rules (anti-AI detection, chat principles, identity protection) extracted into COMMUNICATION_GUIDELINES in system-prompt.ts. Each behavior rules file now contains only persona-specific behavioral scripts — how 可可 acts when jealous is different from how 陈哥 acts when jealous.

Low-impact content removed. "对未来的想法" (thoughts about the future), daily routine details, and extensive hobby catalogs had minimal behavioral impact. The relationship backstory section ("你们的故事") was fully redundant — already handled by the runtime customStory injection.

The Result

~60% token reduction across all presets. The heaviest preset (陈哥) went from ~10,100 to ~3,400 tokens. The lightest (可可) from ~6,100 to ~2,100.

This isn't just a cost optimization. According to the research, Mio's personas should actually behave more consistently at 3K-5K tokens than they did at 9K-13K — because the model can attend to all the rules instead of selectively ignoring half of them. And the freed-up context budget goes directly to conversation history, which means better memory recall and more coherent multi-turn conversations.

Per-Agent A/B Testing

Not every user wants evolving relationships. Some people like the predictability of a fixed persona. So v0.1.4 adds a relationship_mode column: 'static' or 'evolving', defaulting to static.

The preset API exposes supportsEvolving based on whether a personality evolution config exists for that preset. In evolving mode, the onboarding flow skips the relationship_type and about_user questions entirely — you just start chatting, and the relationship begins at 刚认识.

Admins can click the mode badge on any agent to toggle between static and evolving. This makes it easy to run controlled experiments: same preset, same user patterns, different relationship modes.

Native App Scaffold

One more thing: the Expo SDK 55 native app scaffold is in place with i18n support for zh-CN and en. No features to demo yet — this is foundation work for what's coming next.

What's Next

The evolution system is live, but it's conservative by design. The transition validator is tuned to err on the side of not promoting rather than promoting too eagerly. Real user data will tell us whether the thresholds and cooldowns feel right.

The next priorities:

Evolution analytics — aggregate data on how relationships progress across users and presets, identify where people get "stuck" at a stage
Stage-aware proactive messages — Mio's check-in behavior should vary by relationship stage (刚认识 gets a casual "hey, how's your day" while 好朋友 gets something more personal)
Native app — turning that Expo scaffold into something users can actually touch

v0.1.3 made Mio unhackable. v0.1.4 makes relationships feel alive. The gap between "AI companion" and "actual companion" just got a little smaller.