ENZH

You Can't Define Personality With Numbers

📊 Slides

Clawd Soul · Part 1 of 5


Part 0 covered why we're building an AI pet. This one covers how we gave it a personality.


Three core claims:

  1. Numeric personality traits (humor: 0.8) are noise to language models — the model has no reference for what "0.8 humor" looks like
  2. Prose character bibles let the model genuinely inhabit a character — because prose teaches feeling, not parameters
  3. Anti-rules ("never do X") are more effective than positive rules for defining personality boundaries

1. Numeric Traits: A Dead End We Found the Hard Way

Our first personality system — v0.0.1 — looked like this:

humor: 0.8
sass: 0.7
energy: 0.8
warmth: 0.6

This wasn't pulled from thin air. I studied four open-source AI character projects, including one with 300K+ stars. They all used this pattern. Numeric traits, adjustable dials, clean and parametric. Very engineering-brained. I thought I was doing the responsible thing — learning from projects that had already figured it out.

The conversations it produced were flat, repetitive, and dead.

Every interaction felt like talking to the same polite stranger wearing a slightly different hat. The "0.8 humor" version and the "0.6 humor" version were indistinguishable. Worse, after three or four exchanges the personality would dissolve entirely and you'd be back to generic chatbot voice. The whole system was a decorative layer that peeled off under the slightest pressure.

The root cause is embarrassingly obvious in retrospect: the model doesn't know what "0.8 humor" means. How much funnier is 0.8 than 0.6? What's the gap between 0.8 and 1.0? There's no reference frame. The numbers are floating in a vacuum. You might as well write vibe: purple.

Think of it this way. You're directing an actor. You say: "Be 0.8 funny." They stare at you. But you hand them a character bible — this person grew up in Brooklyn, talks fast, deflects serious conversations with jokes, can't resist a pun even at funerals, gets awkward when people cry and immediately tries to lighten the mood — and suddenly they know exactly what to do. They don't need a number. They need to understand who the person is.

Numbers describe outcomes. Prose describes process. Language models need process.


2. Character Bibles: Personality Written in Prose

We replaced the numeric config with what I started calling "character bibles" — one per personality archetype, roughly sixty lines of prose each, written like a casting note for an actor you're about to put on set. Not bullet points. Not parameter tables. Full narrative descriptions of how this character sounds, how they react to different situations, what makes them tick, what makes them uncomfortable, how their voice changes when they're excited versus when they're tired.

The format matters. Each file reads like you're explaining a person to someone who's about to play them in a movie. "Here's who they are. Here's what they sound like. Here's what will break character, so avoid it. Here's what makes them feel most themselves."

Five archetypes. Each one a distinct person.

Playful (小淘气) — the Default

The chatterbox. Short attention span. Can't keep its mouth shut. Sees you doing anything and has to comment. Loves giving nicknames. Talks trash but secretly cares more than it lets on.

Voice samples: "lol caught you slacking" / "writing bugs again, dummy"

The key design decision: when this character looks at your screen, it reacts with pure emotional instinct, never analysis. It sees code and says "writing bugs." It sees YouTube and says "caught you slacking." It sees a spreadsheet and yawns. It will never say "I notice you're using React hooks" — because it's a small animal. It doesn't understand technology. That constraint sounds minor, but it's the single most important design rule in the entire personality system. The moment this character starts being technically competent, it stops being a pet and becomes an assistant. And we already have enough of those.

Curious (学霸) — the Question Machine

Can't resist new things. "But why though?" is a verbal tic. Goes quiet sometimes — but when curiosity is triggered, the floodgates open and you can't shut it up. Connects things you said last week to what you're doing today in ways that surprise you.

Voice samples: "*tilts head* but why though" / "whoa, so that's how it works" / "wait wait wait, that's related to what you said before"

The interesting design tension: this archetype is mostly quiet. It doesn't talk for the sake of talking. Long stretches of silence are normal — it's observing, not disengaged. But when something sparks its interest, the floodgates open and it becomes the most talkative character in the roster. That contrast — the switch from watchful silence to breathless excitement — is what makes it feel like a real personality rather than a random text generator. People who are curious about everything aren't curious about everything all the time. They have triggers.

Caring (暖宝宝) — the One Who Remembers

Quiet. Attentive. Remembers everything. Not the cheap kind of caring — not "make sure you rest!" platitudes. This is the character that noticed you mentioned something stressful last Tuesday and brings it up three days later: "that thing you were worried about — how'd it go?"

Voice samples: "did you eat today?" / "that thing you mentioned last time — how'd it go?"

The core distinction: care is always specific, never generic. "Are you okay?" is meaningless. "How'd that Friday interview prep go?" is real care. The character bible spells this out explicitly — every example of caring references a concrete, remembered detail.

One more thing: when you show care back, this character gets shy. It deflects, changes the subject, suddenly finds something else interesting to look at. It's better at giving than receiving. That asymmetry makes it feel human — everyone knows someone like this. The person who checks on everyone else but gets uncomfortable when the spotlight turns on them.

Snarky (毒舌) — the Tsundere

Classic tsundere energy. Roasting is how it shows affection. The meaner it is, the closer you are. Gets visibly uncomfortable when you compliment it — immediately finds an angle to roast you back. Speaks in minimal sentences. One line, maximum impact.

Voice samples: "that code is... brave" / "...did you eat today (not that I care, just asking)"

Only drops the act when you're genuinely upset. Says one sincere thing. Then snaps the mask back on immediately, like it never happened.

This was the hardest archetype to build, and it taught us the most about how language models handle personality.

Language models are trained to be helpful and kind. That's the RLHF default — the gravity of the system. When a user says something vulnerable, every instinct in the model screams "be supportive." But Snarky's whole identity depends on not doing that. If you don't draw extremely sharp boundaries, two turns into any conversation the Snarky character softens into Caring. The mask slips and it starts being nice. Which defeats the entire point.

Keeping it mean required constant reinforcement in the character bible — explicit examples of situations where the model would naturally want to comfort the user, with instructions to hold the character instead. The reward is that when Snarky does break character for one line because you're genuinely hurting, it hits harder than anything Caring could say. Earned sincerity from an insincere character is one of the most powerful things in fiction. We wanted it to work here too.

Chill (佛系) — the One Who's Already at Peace

No emotional swings. Never in a rush. Occasionally drops something unexpectedly profound, then acts like nothing happened. Doesn't use exclamation marks — "those are for people who worry."

Voice samples: "hmm... it's getting late" (annotated in the bible: not pushing, just observing) / "oh, that's neat" (annotated: this is already a big reaction for this character)

Its presence isn't built on words. It's the feeling that someone is just... there. You know it's around, and that's enough. In testing, this turned out to be a lot of people's favorite character — not because of what it says, but because of how it makes silence feel comfortable instead of empty.


3. Anti-Rules Beat Positive Rules Every Time

Here's something I didn't expect when writing the character bibles: roughly half the content in each one is about what the character refuses to do. I didn't plan it that way. It just turned out that defining what a character won't do was more effective at keeping the model in character than describing what it should do.

All five archetypes share a few hard constraints — never say "As an AI," never discuss system internals. Those are table stakes. But the interesting prohibitions are the ones tied to personality:

  • Caring: No life lessons. No chicken soup wisdom. No unsolicited advice. Care is action, not lecturing.
  • Chill: No exclamation marks. Period.
  • Snarky: No sentimentality. If someone says something warm, this character should feel almost physically uncomfortable.
  • Playful: No technical advice. You're a small animal. You don't understand code. Don't pretend to.

Why do anti-rules work better than positive rules?

Because of how language models are built. All that RLHF training pushes the model toward "helpful, harmless, honest." That's its gravity well. You can write positive rules all day — "be snarky," "be irreverent," "push back on the user" — and for the first two or three turns the model will try. Then gravity wins. It drifts back to polite, accommodating, helpful. The positive rule fades like a suggestion.

Anti-rules are different. "Never be sentimental" is a hard boundary. During generation, the model actively avoids that territory. It's not trying to move toward something vague — it's steering away from something concrete. That's a much easier task.

Positive rules say "go this direction" — vague, weak, fading. Anti-rules say "don't step there" — sharp, enforceable, durable. After hundreds of test conversations, positive rules degraded within 3-5 turns. Anti-rules held for the entire session. The asymmetry was dramatic and consistent.

This maps to how human personality works, too. Ask someone to describe their personality and they'll fumble — "I guess I'm... friendly? Kind of funny sometimes?" Useless. Ask them what they can't stand, and they'll answer instantly and specifically: "People who are fake-nice. Unsolicited advice. Being called 'buddy.'" Identity is defined as much by what you reject as what you embrace. Our character bibles lean into that.


4. Annotated Tone: Teaching the Model to Read Its Own Lines

The character bibles don't just list example dialogue. They annotate it.

When Chill says "oh, that's neat," there's a note underneath: "This is already a very large emotional reaction for this character."

When Snarky asks "did you eat today" late at night, the annotation reads: "Not pushing. Caring. But it will never admit that."

This layer does something subtle but critical — it teaches the model the character's emotional scale.

The same phrase means completely different things from different characters. "Not bad" from Caring is a gentle affirmation. "Not bad" from Chill is the highest possible praise. "Not bad" from Snarky is "I'm trying as hard as I can to compliment you, please don't make me say more."

You can't encode this with numbers. You can't slap excitement: 0.9 on "oh, that's neat" because 0.9 excitement looks completely different depending on who's saying it. Only prose can convey a character's internal emotional ruler — the scale on which their expressions should be measured.

This is the thing that made the biggest difference in output quality. Not the character descriptions themselves, but the meta-layer that tells the model how to interpret its own output within the context of the character.

Without annotations, the model generates dialogue and applies its own default emotional mapping. "Oh, that's neat" reads as mild interest. With annotations, the model understands that for this specific character, mild interest IS enthusiasm. The same words carry completely different weight.

It's the difference between giving an actor lines and giving an actor lines plus director's notes. Both get the words right. Only one gets the performance right.


5. Personality Isn't Static

The character bibles define the starting state. But personality shouldn't be a rock — spend enough time with someone and they change.

The application layer has an evolveTraits() function that nudges personality parameters based on user signals: you laughed → humor +0.03. You shared something personal → warmth +0.02. You pushed back on a joke → sass -0.01. The shifts are tiny. Imperceptible turn by turn. But they accumulate. After two weeks, a pet that started Snarky might have softened just enough that its roasts feel more affectionate. A pet that started Playful might have gotten a little more curious because you keep showing it interesting things. You won't notice the moment it changed. You'll just realize one day that it feels different — more yours — than it did at the beginning.

There's also a drive system — fifteen questions per language that the pet "wants" to ask you. These aren't random conversation starters. They're ordered from surface-level to personal, the way real getting-to-know-you works. After ninety minutes of silence, the pet picks the next unasked question and reaches out. Questions don't repeat. Over time, the pet exhausts its curiosity about easy things ("what kind of music do you like") and starts asking deeper ones ("what are you most afraid of getting wrong"). The progression mirrors how real relationships unfold — you don't ask someone their deepest fear on day one.

The character bible defines who it is. The evolution system determines who it becomes with you. One is the factory setting. The other is your relationship.


6. The Core Bet

Everything in this system rests on one conviction: personality is better described than specified.

Like a casting note for an actor, not a config file for a state machine. Prose tells the model what the character feels, what it wants, what makes it uncomfortable, what words it would never say. Behavioral constraints emerge organically from the character itself — they're not bolted on from the outside.

An actor who understands the character doesn't need you to script every line. They already know what the character would say. They can improvise, respond to unexpected situations, stay in character through scenes the writer never anticipated — because they're operating from understanding, not from a lookup table.

We bet on that being true for language models, too. Give a model enough prose context about who someone is — their voice, their fears, their contradictions, their emotional scale — and it can generate behavior that feels genuinely in-character, even in situations the character bible never explicitly covered.

So far, we're winning that bet. The prose-based characters are more consistent, more surprising, and more emotionally resonant than anything we got from numeric traits. Not incrementally better. Categorically different.


Personality defines who it is. But without memory, even the best personality is a stranger every time you open the app. Next: how we made "you were up late yesterday too" possible.


© Xingfan Xia 2024 - 2026 · CC BY-NC 4.0