The Vanity Formula: An Experiment in AI-Generated Social Moments

In July 2023, Miaoya Camera (妙鸭相机) launched in China and hit 600,000 daily active users in its first month. Upload a few selfies, pick a template, get AI-generated photos of yourself in different styles. It was fun, shareable, and novel.

By 2024, DAU had dropped to 36,000. A 94% decline.

The product worked exactly as designed. Users uploaded photos, got results, shared them once or twice, and never came back. The novelty wore off because there was nothing underneath it. No reason to return. No deepening engagement loop. Just a trick you'd seen before.

Every AI photo app since has followed the same trajectory. Launch, spike, decay. The pattern is so consistent it has a name in Chinese product circles: "烟花效应" — the fireworks effect. Bright, brief, gone.

I've been building ÉLAN (灵颜) — an AI photo and caption generation platform — and the first question I had to answer was: how do you avoid being the next firework?

The Real Problem Isn't Photos

The insight came from watching how people actually use AI-generated photos.

Nobody generates a photo and stares at it. They generate it to post it. The photo is a means to an end — the end is social presentation. Specifically, on Xiaohongshu (China's Instagram) and WeChat Moments (China's curated social feed).

And social presentation has rules. Very specific, culturally encoded rules that no existing AI photo app understands.

I ran a deep research survey across five dimensions — photo styles, poses, caption methodology, user psychology, and competitive UX. 42 supporting data points, cross-validated. The core finding was a single concept that drives everything:

不经意的优越感 — inadvertent superiority.

The Vanity Formula

Here's how it works on Xiaohongshu and WeChat Moments:

Caption describes a small thing (daydreaming, sipping coffee, wandering) + Photo reveals a big thing (Aman resort, Hermès bag, first-class cabin) = Effortless elegance

The Chinese internet calls this "凡尔赛" (Versailles) — the art of humble-bragging so subtly that the luxury feels incidental. It operates on three layers:

Scene-level. The backdrop isn't "a hotel pool" — it's an infinity pool at a recognizable luxury resort. But the person in the photo is "casually reading." The luxury is the setting, not the subject.

Outfit-level. Brand logos enter the frame "accidentally." You're holding an Hermès bag, but the photo's focus is on the distant skyline. The bag is just... there.

Caption-level. The text talks about emotions, not consumption. Not "checked into the Aman" but "难得什么都不想" (rare to think about nothing). The photo says luxury. The words say simplicity. The mismatch is the entire point.

This is what users actually want from AI photos. Not "a photo of me in a different outfit." They want a complete social media moment — photo plus caption plus the right emotional register — that projects the life they want to be seen living.

Most AI photo apps stop at the photo. ÉLAN is my attempt at delivering the whole moment.

What I Think Most Apps Miss

Looking at competitors, a few patterns stood out:

Too many parameters. Apps like Xingtu expect users to understand lighting, color temperature, and composition. The target user — 18-35 year old women who want beautiful photos with zero effort — bounces immediately. If you need to think, the product has failed.

No caption. This is the biggest gap. You generate a gorgeous photo, then stare at it trying to think of what to write. The photo-to-post conversion rate dies here. In our research, caption writing was identified as the highest-friction point in the entire workflow.

Novelty without depth. Miaoya's templates were one-dimensional: "ancient Chinese," "professional headshot," "magazine cover." You try each one once, you've seen everything. There's no reason to return because the templates don't connect to anything the user actually needs to do (i.e., post on social media with a specific emotional intent).

They feel like AI. The UI screams "tech tool" — sliders, progress bars, model names. The moment a user feels like they're operating software, the magic is gone. Nobody wants to think about inference when they're trying to look good.

Muse Cards: The Structural Answer

ÉLAN's core UX innovation is the Muse Card (灵感卡) — a curated, opinionated package that encodes an entire visual story.

Each Muse Card isn't a "template" or a "filter." It's a complete photoshoot brief:

Scene definition — specific location type, lighting conditions, time of day, with embedded luxury brand hints that appear incidental
Outfit configuration — clothing style, color palette, accessories with brand hints described as aesthetic direction rather than product placement
Pose library — 4-5 narrative shots (establishing, portrait, detail, candid, closing) that tell a visual story when posted as a set
Color grading — specific warmth, contrast, saturation targets that create a consistent "film" look
Caption templates — 3 switchable styles (Versailles/humble-brag, poetic, minimal) pre-written for each scene, adaptable to Xiaohongshu or WeChat Moments format

The user sees none of this complexity. They see a card with a preview image and a name like "无边泳池" (Infinity Pool) or "咖啡馆午后" (Café Afternoon). They tap it. They get 4 photos and a caption. Done.

Currently 18 cards across four categories:

Category	Cards	Vibe
远方的光 (Luxury Travel)	6	Resort pools, hotel suites, island escapes
城市漫游 (Urban)	5	Rooftop bars, shopping, city walks
日常诗意 (Artistic Life)	4	Cafés, galleries, reading rooms
时令之美 (Seasonal)	3	Cherry blossoms, autumn estates, winter onsen

The key design principle: users don't choose parameters — they choose outcomes. They see the life they want to project, not the knobs they need to turn.

The VANITY_DESIGN_INSTRUCTIONS

Under the hood, every Muse Card feeds into a 10-section prompt sent to Google Gemini's image generation model. One section is critical and unique to ÉLAN:

The VANITY_DESIGN_INSTRUCTIONS block tells the model: luxury items must appear incidental, never centered. The Hermès bag should be at the edge of the frame. The hotel logo should be slightly out of focus. The overall feeling should be "this is just my normal Tuesday" — not "look what I have."

This single prompt section is what separates ÉLAN from every "put me in a luxury setting" app. Without it, the AI centers the luxury — which makes the output look like an ad. With it, the luxury becomes backdrop — which makes it look like a life.

Three Taps, Not Three Minutes

The entire flow is three steps:

选张美照 (Select a photo) — upload one headshot (required) + one body shot (optional)
选个灵感 (Choose inspiration) — browse Muse Cards, tap one
光影创作 (Create) — receive 4 photos + caption, switch caption styles, save or share

No sliders. No dropdowns. No "advanced settings." The product philosophy is "把饭喂进嘴里" (put the food in their mouth) — if the user has to make a decision more complex than "which vibe do I want today," we've failed.

The caption generates in one of three styles — Versailles (humble-brag), poetic (artistic), or minimal (ultra-short) — and auto-adapts to Xiaohongshu format (longer, hashtags, emoji) or WeChat Moments format (shorter, no hashtags, intimate). One tap to switch. One tap to copy.

Avoiding the Firework

The Miaoya problem isn't technical — it's structural. Novelty decays. The question is what you replace it with.

ÉLAN's answer is content-as-a-service: the Muse Card library is a living catalog, not a static feature set. New cards drop weekly. Seasonal editions (cherry blossom season, Mid-Autumn Festival) create time-limited urgency. User voting influences which concepts get produced next. The engagement loop isn't "try the AI" — it's "what new looks are available this week."

This shifts the retention mechanism from novelty (which decays) to curation (which compounds). The more cards we ship, the more likely a user finds one that matches their current mood, trip, or social moment. The library becomes the product, not the AI.

What ÉLAN Is Really Selling

Not photos. Not AI. Not technology.

ÉLAN sells the feeling of being the person in those photos. The person who casually sips coffee at an Aman resort. Who "happens to" have a Birkin in the frame. Who posts a single understated line while the photo does all the talking.

The AI is invisible. The brand doesn't mention it. The UI doesn't show model names or generation progress in technical terms. The loading screen says "你的光，刚刚好" (your light, just right) — not "generating image with Gemini 3 Pro."

Because nobody wants to be seen using AI to look good. They want to look good — and the tool should be invisible.

This is Part 1 of the Building ÉLAN series. Part 2 will go under the hood — the Gemini multimodal prompting system, how VANITY_DESIGN_INSTRUCTIONS actually works in practice, and the SSE streaming architecture that makes generation feel instant.

This post is also available in Chinese (中文版).