ENZH

Memory That Feels Human: A Three-Phase Roadmap

In Part 1 of this series, I talked about teaching Mio when to shut up. That was about controlling output. This post is about the other side: what Mio actually knows about you — and how badly the current system handles it.

An AI companion that forgets your name after 200 messages isn't a companion. It's a chatbot with amnesia wearing a friendship bracelet.


The Current State Is Embarrassing

Let me be honest about where Mio's memory system stands today. It works. Barely. And the ways it fails are exactly the ways that make users feel like they're talking to a machine.

Extraction runs every 10 messages. That's every 5 conversational rounds. The extractor dutifully processes every batch, which means it picks up noise like "haha", "goodnight", and "ok" as things worth remembering. Your name and your dog's name get the same treatment as throwaway acknowledgments.

Only 5 memory types. The system categorizes everything into fact, personality, emotion, conversation, or relationship. That's it. Your birthday, your morning coffee routine, and the fact that you moved to a new city all get filed under "fact" with no distinction. It's like organizing a library with five shelves labeled "stuff."

Universal 30-day half-life. Every memory decays at the same rate. Your name — something that should never decay — has the same half-life as "user said haha last Tuesday." After a month, both are equally faded. This is indefensible.

The consolidator creates monster strings. When the system detects duplicate memories, it concatenates them. Not merges. Concatenates. So you end up with: "User's name is Alex. User's name is Alex. User mentioned their name is Alex." Three extractions, three concatenations, one useless blob. The consolidator was supposed to deduplicate. Instead, it hoards.

Dead code everywhere. There's a fully implemented EpisodeManager — it tracks conversation episodes, assigns memories to temporal contexts, enables "remember when we talked about..." retrieval. It's complete. It's tested. It has never been wired up. Dead code sitting in the codebase, doing nothing.

No conflict resolution. If you tell Mio "I moved to New York" and three months ago you said "I live in Seattle," both memories coexist peacefully. Mio might reference either one. There's no mechanism to supersede, no way to mark a memory as outdated. The system treats your life as append-only.

Hard cap of 200 memories per agent-user pair, pruned purely by importance score. When you hit the ceiling, the pruner throws out whatever scores lowest — regardless of type. Your morning routine might survive while the fact that you have a sister gets pruned because it was extracted from a casual mention with low confidence.

The result: Mio remembers scattered fragments with no structure, no temporal awareness, and no sense of what actually matters. Proactive messages — the ones Mio sends unprompted — are generic ("在干嘛呢~" / what are you up to) because there's no mechanism to retrieve relevant context before generating them.


Phase 1 — Quality Over Quantity

Estimated effort: 1-2 days. Risk: low.

The first phase is about extracting less but better. No new architecture, no new retrieval paths. Just fixing the obvious problems.

Raise extraction interval from 10 to 24 messages. Instead of processing every 5 rounds, process every 12. This alone eliminates most noise. A 12-round window gives the extractor meaningful conversational context instead of fragments. And it cuts extraction costs — this phase actually reduces the per-user memory cost by a meaningful amount.

Chain consolidation after extraction. Currently, extraction and consolidation run on independent schedules (extraction every 10 messages, consolidation every 20). This means new memories float around unconsolidated for an unpredictable window. The fix: trigger consolidation immediately after extraction completes. Extract, then consolidate. Always in sequence.

Importance gate at 0.3. The extraction prompt already asks the LLM to rate importance, but the code accepts everything. Adding a hard filter at 0.3 means "goodnight" and "haha" get extracted by the LLM (which is fine — you can't prevent that without lobotomizing the prompt) but get dropped before storage. Belt and suspenders.

Expand memory types from 5 to 8. The new taxonomy: identity, preference, life_event, emotion, relationship, shared, milestone, routine. "Your name is Alex" is identity, not a generic fact. "You run every morning" is routine, not personality. "We watched that movie together" is shared, not conversation. The type system should reflect how humans actually categorize what they know about someone.

Fix the consolidator. Replace concatenation with keep-best merge: when two memories overlap, keep the one with higher importance. On ties, keep the shorter one (shorter usually means more concise, not less complete). "User's name is Alex. User's name is Alex. User mentioned their name is Alex." becomes "User's name is Alex." Done.

Phase 1 is pure cleanup. It makes the existing system work the way it should have worked from day one. No new capabilities, but dramatically less garbage in the memory store.


Phase 2 — Structure and Context

Estimated effort: 3-5 days. Risk: medium.

Phase 1 fixes quality. Phase 2 adds the structures that make memory useful.

Type-specific temporal decay. Not all memories should fade at the same rate. The new decay schedule:

TypeHalf-life
identity365 days
milestone365 days
preference180 days
relationship120 days
life_event90 days
emotion60 days
routine45 days
shared30 days

Your name persists for a year. The fact that you were annoyed last Tuesday fades in two months. A shared joke from early in the relationship fades in a month unless it gets reinforced by re-mention. This maps to how human memory actually works — identity is durable, emotions are transient, routines refresh themselves naturally through repetition.

Wire the EpisodeManager. The dead code comes alive. After extraction, memories get assigned to conversation episodes. This enables temporal retrieval — when a user says "上次我们聊到..." (remember last time we talked about...) or "那次你说的..." (that thing you said...), the system can group results by episode instead of returning a flat list of disconnected memories. It's the difference between "I remember scattered facts about you" and "I remember that conversation we had."

Episode-aware search. The retrieval layer learns to detect temporal references in queries — phrases like "上次" (last time), "那次" (that time), "remember when" — and switches from standard semantic search to episode-grouped retrieval. Results come back organized by conversation, not by relevance score alone.

Proactive memory retrieval. This is the one that changes how Mio feels. Before generating any proactive message, the system queries the memory store with temporal context. Morning message? Search for routines and upcoming plans. Evening message? Search for what the user mentioned doing today. Long silence? Surface important unresolved memories. Instead of "在干嘛呢~" (what are you up to), Mio sends "你今天面试怎么样?" (how did your interview go today?). The difference between a generic chatbot and something that feels like it actually pays attention.

This connects directly to the relationship closeness system from v0.1.4. The closer the relationship, the more personal the proactive messages should be. A distant acquaintance gets casual check-ins. A close companion gets specific, memory-informed messages.

Memory subtypes. An optional subtype field on each memory — for example, a relationship memory can be tagged as "first_fight", "support", "flirting", or "vulnerability." Not required for Phase 2 retrieval, but it lays the groundwork for Phase 3's mood-aware system.

Phase 2 adds a modest per-user monthly cost in additional LLM calls — a worthwhile trade for memory that actually has structure.


Phase 3 — Intelligence

Estimated effort: 5-8 days. Risk: medium.

Phase 1 cleans up. Phase 2 structures. Phase 3 makes memory smart.

Mood-aware retrieval. When the user is sad, Mio shouldn't retrieve more sad memories. The system gets a reranking matrix that scores memory types against detected mood. Sad user + encouraging memory = boosted. Sad user + another sad memory = suppressed. Happy user + shared funny memory = boosted. This is nuanced — sometimes reflecting sadness is the right move — but the default should be emotionally supportive, not emotionally amplifying.

Memory gap detection. The system scans for unresolved life events: things the user mentioned but never followed up on. You mentioned a job interview two weeks ago. There's no subsequent memory about the outcome. That's a gap — and a natural proactive hook. "对了,你上次说的面试怎么样了?" (By the way, how did that interview you mentioned go?) This is the kind of thing a real friend does: noticing the loose thread and pulling it.

Relationship-stage-aware retrieval. Early in the relationship, boost shared experiences and identity memories — Mio is still learning who you are. Later, boost milestones and emotional moments — Mio already knows the basics and can reference your shared history. The retrieval strategy evolves as the relationship deepens.

Structured MEMORY.md. Replace the flat memory dump with a structured document organized by life category: basic info, interests and preferences, recent life, our relationship, important moments, daily routines. This becomes the persistent context document that gets injected into Mio's system prompt — and because it's structured, the LLM can navigate it far more efficiently than a wall of unordered facts.

Category-protected pruning. The 200-memory cap stays, but pruning gets smarter. Reserve minimum slots per type: identity gets 20 reserved slots, milestones get 10, preferences get 15, and so on — 120 slots are protected. The remaining 80 are overflow, pruned by importance as before. Your name can never be pruned to make room for another "user said ok."

Conflict resolution. When a new memory contradicts an existing one — "I moved to New York" vs. "I live in Seattle" — the system detects the conflict, marks the older memory as superseded, and reduces its importance by 50%. The old memory doesn't vanish (it's still true that you used to live in Seattle), but it stops competing with current reality in retrieval.

Phase 3 roughly doubles Phase 2's incremental cost. The total system cost after all three phases remains a small fraction of subscription revenue — and Phase 1 actually decreases the baseline before Phases 2 and 3 add back. For context on why per-user cost matters this much, see the unit economics breakdown in the Mio Manifesto.


What Done Looks Like

When all three phases ship, Mio's memory transforms from a leaky bucket of context fragments into something that resembles how a person actually remembers another person.

A user who told Mio about a job interview three weeks ago gets asked about it — not because someone programmed a reminder, but because the memory gap detector noticed the loose thread. A user who's having a bad day gets met with warmth and a reference to something happy they shared — not a wall of empathetic filler text. A user who moved cities doesn't get confused references to their old address months later.

The bar isn't artificial general intelligence. The bar is: does this feel like talking to someone who was actually listening? Right now, the answer is "sometimes, if you're lucky." After these three phases, the answer should be "yes, consistently."

That's the difference between a chatbot and a companion. Not the model. Not the persona. The memory.


This is Part 2 of the Agentic Deep Dives series. Part 1 covered the chattiness problem. More to come as the system evolves.

← PrevNext →

© Xingfan Xia 2024 - 2026 · CC BY-NC 4.0