Mio v2: Contact-Based Memory

The Problem

Mio's memory system stored everything as flat rows in a memories table. Each row has content text, a vector embedding, and some metadata. Semantic search worked fine for things like "what does the user like to eat" — the embedding for that query lands close to "user prefers spicy food" and you get a relevant hit.

People break this model.

If a user mentions their friend Xiaohong across five different conversations — she works at Tencent, she lives in Shenzhen, they met in college, she got promoted, she's been stressed lately — those become five unrelated rows scattered across the table. There's no structural link between them. Retrieving "everything about Xiaohong" means falling back to ILIKE '%小红%', which is an O(N) full-table scan with no semantic understanding.

And it gets worse. "小红最近怎么样了" (how's Xiaohong doing lately) has near-zero embedding similarity to "小红在腾讯当工程师" (Xiaohong works at Tencent as an engineer). One is a question about recent wellbeing. The other is a factual statement about employment. The embeddings live in completely different regions of vector space. So when a user asks about their friend, the model literally cannot recall what it knows. The memories exist. They just can't be found.

Contacts as First-Class Entities

The fix is straightforward once you see it: give people their own table. A contacts table where each row represents a person in the user's life, with structural links back to every memory that mentions them.

CREATE TABLE contacts (
  id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id       UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  name          TEXT NOT NULL,
  aliases       TEXT[] DEFAULT '{}',
  relationship  TEXT,
  attributes    JSONB DEFAULT '{}',
  first_mentioned_at TIMESTAMPTZ DEFAULT NOW(),
  last_mentioned_at  TIMESTAMPTZ DEFAULT NOW(),
  mention_count INTEGER DEFAULT 1,
  memory_ids    UUID[] DEFAULT '{}',
  created_at    TIMESTAMPTZ DEFAULT NOW(),
  updated_at    TIMESTAMPTZ DEFAULT NOW()
);

The important column is memory_ids — a UUID array linking the contact to every memory row that references them. No more scanning. You look up the contact, you get the IDs, you fetch the memories directly.

attributes is a JSONB column that holds LLM-generated structured data: job title, birthday, personality traits, interests, a chronological timeline of events. The consolidator (more on that below) fills this in over time as the system accumulates enough raw material.

The Pipeline

When the user says "小红最近工作压力很大" (Xiaohong has been really stressed at work lately), three things happen at different cadences.

Extraction runs every ~10 messages. The MemoryAccumulator processes the conversation buffer, identifies extractable memories, and classifies each one by subtype. When it sees person-related subtypes — friend, family, colleague — it calls syncContacts(), which upserts the contact row and appends the new memory's UUID to memory_ids.

Consolidation runs every 50 messages. PersonConsolidator gathers all linked memories for each contact, sends them to an LLM, and asks it to produce a structured profile: a personality paragraph, filled-in attributes, and a chronological timeline of events. The source fragments then get their importance score dropped to 0.05 so they stop competing with the consolidated profile in retrieval.

Retrieval runs every message. The aggregator checks whether any contact names appear in the user's input. If so, it pulls the contact row, builds a contact card, and injects it straight into the system prompt. No embedding search involved at all.

The contact card looks like this:

### 小红
Relationship: friend
Job: engineer at Tencent
Location: Shenzhen
Personality: outgoing, competitive, caring
Key events:
  - User met 小红 in college
  - She moved to Shenzhen for work
  - She got promoted last quarter

This goes into the system prompt before the conversation starts, so the model has full context about every mentioned person without needing to "search" for anything.

Two-Phase Retrieval

The actual query strategy layers structural lookup with a text fallback:

contactMemorySearch(userId, names, contactManager, limit):
  1. linkedIds = contactManager.getMemoryIdsForMentionedContacts(userId, names)
  2. if linkedIds.length > 0:
       SELECT ... WHERE id = ANY(linkedIds) OR content ILIKE ANY(patterns)
     else:
       SELECT ... WHERE content ILIKE ANY(patterns)

Linked IDs first — O(1) lookup from the contacts table. ILIKE as a fallback for memories that existed before the contact was synced, or where sync failed silently (this happens more than I'd like). Over time, as more memories get linked, the ILIKE fallback fires less and less. The system self-improves just by running.

Bugs I Hit

The array index bug was the worst. storedIds[i] was supposed to correspond to personMemories[i], but storedIds came from mixed ADD/UPDATE operations — some entries were new inserts, others were updates to existing rows. The array positions didn't align. Wrong memories got linked to wrong contacts. A user's mom's birthday got attached to their coworker. I only caught this because I was manually inspecting contact cards and something looked off. Fix: query the DB for content→ID mappings directly instead of trusting array positions from a batch operation.

Empty UUID arrays crash Postgres. If a contact has no linked memories yet and you pass '{}'::uuid[] into ANY(), Postgres throws a type error in some query plans. I had to split the linked-ID path and the fallback path into two separate queries.

The importance floor was subtle. GREATEST(0.1, importance * 0.5) was supposed to down-rank superseded memory fragments after consolidation. But 0.1 is still high enough that they show up in top-K retrieval alongside the consolidated profile. The user sees both the clean summary and the raw fragments it was built from. Changed the floor to 0.05.

CJK truncation: .slice(0, 100) on a Chinese string can split a multi-byte character in half, producing garbage. Fix: [...text].slice(0, 100).join('') — spread into an array of code points first.

The consolidator kept re-processing memories that had already been consolidated. Every 50 messages it would re-read all the fragments, re-generate the same profile, and re-write the same data. Fix: filter the source query with (metadata->>'consolidated_into') IS NULL.

What's Still Broken

Contact name matching is exact string match only. No fuzzy matching. If the user calls their friend "小红" in one message and "红红" in another, those are two different contacts.

Pronoun resolution doesn't exist. "她最近怎么样了" (how's she doing lately) doesn't resolve "她" to any specific contact. The contact card injection papers over this at the system prompt level — if the user mentioned 小红 by name recently, the model has the card and can infer who "她" refers to. But the memory search itself has no idea. If the user hasn't named the person in the current conversation, the card never gets injected and the model is on its own.

The consolidator is O(N²) — it re-reads all linked memories for every contact, every time it runs. Fine at 50 memories. Questionable at 500.

PostgreSQL's built-in text search tokenizer handles Chinese poorly. ts_vector word segmentation on Chinese text is essentially useless without a plugin like pg_jieba or zhparser, neither of which I've set up.

There's a race condition in memory dedup. Two concurrent extraction runs can both insert the same memory before either's dedup check sees the other's insert.

At 20 messages per day, a user accumulates roughly 600 memories per year. After three years, that's 1,800 memories. The O(N²) consolidator will start hurting well before then. I know this needs a fix — probably incremental consolidation that only processes new memories since the last run. Haven't built it yet.

For the broader system architecture, see Part 3. For the reasoning behind the v2 pivot, see Part 1.