I Open-Sourced My Entire ~/.claude

Six months of accumulated Claude Code configuration. ~50 custom skills, ~50 subagent definitions, ~40 slash commands, a dozen automation hooks, a global CLAUDE.md, and the sync infrastructure I just used to publish all of this safely.

The repo: github.com/xingfanxia/ax-claude-config-public.

Here's why I did it, what's actually in there, and the workflow philosophy I picked up along the way.

The trigger

The first time I ran the publish script, it failed.

Not because of an OpenAI key or a GitHub token. The audit grep caught a Python .pyc file in scripts/__pycache__/ — compiled bytecode with my home directory path baked into its constant pool. I couldn't see it with my eyes. The tooling did.

That was the moment I realized the real point of building all this. The configuration is interesting, but the audit-and-refuse mechanism around it is more interesting. I'd built a system that protects me from my own carelessness, and watched it work.

Three minutes later I had **/*.pyc in the denylist, the script passed, and 325 files went public.

What is `~/.claude`?

If you've used Claude Code (Anthropic's terminal CLI), you know it has a config directory called ~/.claude. By default this is mostly empty. You install a plugin or two, edit settings.json, and most people stop there.

I didn't stop there.

Six months ago that directory was empty. Today it has 51 skills, 51 subagents, 42 slash commands, 13 hooks, a CLAUDE.md, an engineering ruleset, and a custom test suite. None of this was planned. Each piece was added the day I needed it.

khazix-writer is a Chinese long-form writing skill modeled on a popular WeChat author. Four-layer self-check (hard rules, style match, content quality, "is this a real human" final pass) tries to drag AI prose back toward something that sounds alive.
huashu-design is HTML hi-fi prototyping plus slide decks plus animation, with five lineages and 20 design philosophies baked in (Pentagram for information architecture, Field.io for kinetic poetry, Kenya Hara for Eastern minimalism, Sagmeister for experimental boldness), plus an explicit anti-AI-slop checklist to ban purple-pink gradients and Inter everywhere.
banxian-skill is a three-system Chinese divination skill (xiao liu ren, mei hua yi shu, liu yao). Claude picks the right method for the question, a Python engine generates real hexagrams, and a knowledge base from a popular fortune-telling WeChat account drives the interpretation.

None of this was a roadmap. Just a pile of accidents that turned out to be load-bearing.

Why open source it

Three real reasons. None of them are growth.

First, friends and coworkers kept asking. I'd be in a meeting with banxian-skill running in another tab, someone would notice the hexagram on screen, and ask what that was. After explaining for the third time, I figured I should just have a link to send.

Second, I genuinely don't know which skills I still use. Putting them all in one indexable repo forced me to walk through them. INDEX.md is auto-generated from frontmatter, so on the first pass it was the first time I'd seen all the names in one place. About a dozen are dead and need pruning. Wouldn't have noticed without doing this.

Third, I want to be told I'm wrong. The configuration definitely has stupid parts. Open-sourcing it is how I get those parts identified.

What's actually in there

The interesting stuff is in skills/. Roughly four categories:

Writing skills. khazix-writer (oral/narrative Chinese long-form), wandian-writer (cool data-driven industry analysis in the style of LatePost), wechat-article-writer (corporate WeChat content), narrative-research (horizontal-vertical analysis methodology that produces a 10k-word PDF report on any topic). Each one solves a different writing pain.

Design skills. huashu-design (HTML prototypes, slide decks, animations, design exploration), frontend-design (production UI in real codebases), design-md (drop in a known brand's design system from a curated collection of 66), design-consultation (build a design system from zero through structured Q&A). I use these less often but every time they save me.

Engineering skills. big-task (workflow router that picks process tier by risk not file count), audit-fix-loop (iterative review until clean), codebase-sweep (full repo audit), plan-design-review (design completeness check before implementation), pr-fix-loop (autonomous response to PR review feedback until CI green). I use these every day.

Weird skills. banxian-skill (Chinese divination), pet-forge (custom desk-pet animations), jewelry-marketing (XHS bundle generator for jewelry merchants), photoshoot (influencer-style photo sets with reference image input), notion-infographic (Notion-style hand-drawn infographics from any outline).

The weird ones are the soul of this config. The engineering tools you can find anywhere. But "my friend sells jewelry and needs Xiaohongshu marketing material from one product photo" — that's a skill only I would write.

Skills, subagents, slash commands

These are three different things and they compose.

A skill is documentation. It tells Claude how to do something.
A subagent is a worker. It takes a task, runs in isolated context, returns a result.
A slash command is a trigger. Type /big-task and the whole workflow fires.

Concrete example. I tell Claude "implement a new API endpoint." It autonomously:

big-task skill auto-triggers, decides this is tier 3 (medium risk, light process)
Runs GSD workflow: discuss → plan → execute
Spawns gsd-plan-checker subagent for goal-backward verification of the plan
Spawns parallel gsd-executor subagents for independent sub-tasks
Spawns code-reviewer subagent on the diff
Spawns security-reviewer subagent if the code touches user input or secrets
Triggers neat-freak skill to sync docs and memory against the changes
Spawns doc-updater subagent to refresh the codemap

I write one sentence. The whole pipeline runs on its own. Five years ago this would have been science fiction.

The workflow philosophy

This is the part I actually want to convey. Six principles I converged on, abbreviated SIMPLED.

S — Surgical. Don't touch logic outside the task.
I — Investigate first. Read source before answering. State root cause before fix.
M — Maintainability. Small files, clear structure, modular design.
P — Purpose-driven. Question assumptions. Reason from first principles. Tests first.
L — Lean. Three similar lines beats a premature abstraction.
E — Extreme ownership. Pre-existing lint errors and flaky tests are your problem now. Don't leave the codebase worse than you found it.
D — Delegate for parallelism, not context protection. 1M context holds the noise. Spawn subagents for genuine parallelism or model specialization (Haiku for cheap lookups), not to "protect" main context.

Each of these came from a specific failure. The D one is the most recent. Early on I'd spawn subagents constantly, justifying it as "context protection." Then I noticed the subagent reports came back filtered and distorted, and I'd have to re-derive the missing details anyway. The 1M context window holds whole sessions of noise — protect it less, use it more. Spawn for actual reasons.

Risk-based task routing

The big-task skill is a workflow router. Earlier versions of my prompts asked Claude "is this a small or large task" and it would always answer by file count. 1-2 files is small, 10+ files is large.

That's the wrong axis.

Risk and volume are different. A schema migration touching one file is high risk. A typo fix touching 50 files is low risk. A concurrency change to a single function is high risk. The skill now routes by signal, not size:

Schema/migration → tier 3-4 (full GSD)
Concurrency, atomicity, protocol → tier 4
Auth, secrets, payment → tier 4
New endpoint, component, system → tier 3
Bug fix → tier 1-2 (TDD red→green)
Typo, config, one-liner → tier 1 (just do it)

Behind every line in that table is a specific incident. The whole thing is calibrated by past mistakes.

GSD: discuss, plan, execute

A separate plugin I've been running for months, adapted into my main loop. Three phases:

Discuss surfaces gray areas. Decisions like "GET or POST," "cache or DB," "fail open or closed" — these get nailed down here, before the plan exists. By the time plan-writing starts, every TBD is resolved.

Plan writes the executable steps. Not "implement X" but "at file.py:42, add this code." Every step has a specific file, code snippet, and command. If a plan has placeholders, it's not done.

Execute runs the steps. Each step ends in a commit. If something fails, you don't pile fixes on top — you go back to discuss to redo the hypothesis.

The point isn't the structure. It's that GSD forces decisions earlier than my instinct wants. My old failure mode was "start coding, hit a fundamental problem at 80% completion, throw away the work." GSD pushes that fundamental problem into discuss, where it costs nothing to resolve.

Neat-freak: enforced documentation hygiene

A Stop hook on every session. After every meaningful change, it reconciles code against CLAUDE.md, README.md, and docs/. If the code added a new env var that the README doesn't mention, it gets fixed. If a memory file points at a now-deleted skill, it gets updated.

The skill exists because I'm afraid of documentation rot.

Code gets maintained because tests fail when it breaks. Documentation doesn't have automated feedback. But in the agent era, this matters more than it used to. Agents read CLAUDE.md to understand the project. If CLAUDE.md is wrong, the agent works on a false premise. Drift between code and docs is now a correctness problem, not just a tidiness problem.

neat-freak is how I keep my docs honest without having to remember to.

The rule I added today: three-pass critique

I'd given Claude a design spec to write. It wrote one. I read it, said "looks good, ship it." Claude wrote the implementation plan, started executing. Mid-way through I noticed obvious problems in the plan.

"You did a self-review, right? How did you miss these?"

"I did one self-review pass. It said it was good."

I asked it to critique again. Found a pile of P0 issues immediately. So I asked it to do three passes, with different lenses each time:

Architecture / correctness — does the design actually work? Are claims verified empirically?
Security / completeness — are failure modes enumerated? Industry-standard checks present?
Process / human factors / UX — will a real human actually use this without footguns?

That round produced a stack of P0/P1/P2 findings: an over-engineered URL token swap that empirical grep showed had zero actual occurrences. An audit grep that didn't include gitleaks, missing a whole class of secret patterns. Sync direction undeclared (one-way mirror or bidirectional?), creating ambiguity that would have caused data loss later.

None of this surfaced in the original "self-review."

I added the rule globally. "Always run three distinct critique passes on any plan or spec before moving on. Empirically grep before you trust any claim about file content."

It's a new rule, but it's already saved my work in this very session.

"AX doesn't review"

Another rule I encoded as memory.

AX never reviews code or long-form artifacts (plans, specs, designs, roadmaps). The agent runs multipass review on everything itself. AX only reviews concise sign-off summaries (≤10 lines) or critical decisions.

Why? Because I cannot read a 1000-line PLAN.md. I tried. I'd get halfway through, glaze over, and rubber-stamp it. The agent would interpret that as approval. Bugs ensued.

Once I admitted to myself that I wasn't going to read it, the obvious move was to make the agent run rigorous self-review instead of relying on my eyeballs. I read the 10-line sign-off summary at the bottom — objective, scope, risks, acceptance criteria. That I'll read.

The agent's work quality went up after I made this explicit. It knew there was no human safety net, so its own review got stricter.

The sync infrastructure

The thing that actually gets the config from private to public is straightforward: one Bash orchestrator and three Python helpers.

sync-public.sh — Bash. Phase A preflight, Phase B copy + sanitize, Phase B.5 prune, Phase C audit.
sanitize.py — applies regex substitution rules from sanitize-rules.txt.
build-settings-template.py — transforms settings.json into a sanitized public template.
build-index.py — auto-generates INDEX.md from skill/agent/command frontmatter.

About 600 lines total, with 23 pytest unit tests.

The interesting part is the two-tier audit.

Tier 1 — identity-term grep. Searches for xingfanxia, computelabs, x@computelabs, /Users/xingfanxia. Any hit, exit 1.

Tier 2 — gitleaks. Industry-standard secret scanner. Catches ghp_*, github_pat_*, sk-*, sk-ant-*, AKIA*, Slack tokens, Telegram bot tokens, Stripe keys, generic high-entropy strings. Any finding, exit 1.

The whole pipeline is fail-closed. The script refuses to push if either tier reports anything.

Today's first bootstrap actually caught three real leaks I would not have noticed:

gws-gmail-forward had my personal Gmail handle baked into a directory path constant. Never thought twice about it before — I wrote that skill for myself.
omni-search and omni-doc referenced our internal API endpoint at omni.computelabs.ai. Both skills are useless without internal access anyway.
The Python __pycache__ issue I mentioned at the top — bytecode strings I literally couldn't grep for visually.

The fix for the first two: add them to the denylist. They're personal infrastructure, not portable. The fix for the third: **/*.pyc and **/__pycache__/ to denylist.

After those changes: 4/4 Tier 1 pass, 0 gitleaks findings, 325 files clean. Live on GitHub.

That fail-closed audit is my biggest vote of confidence in the AI era's "automation." Build the safety net. Trust the safety net. When it fires, listen.

Who this is for, and who it isn't

Is for:

People who want to build their own Claude Code config but don't know where to start. This is a working reference you can fork, copy, mock, or steal from.
Developers curious about what heavy Claude Code usage looks like. This is the output of using it as a daily driver for six months.
Researchers studying agentic tooling. The skill/subagent/hook composition pattern is not yet standardized. This is one specific instance of it that survives daily use.

Is not for:

People expecting git clone to give them a working setup. The README says "portfolio + reference, not turnkey install" in paragraph one. Many skills depend on external API keys (OpenAI, Anthropic, Gemini, Notion, Linear), MCP servers (claude-in-chrome, context7), or specific plugin installs. Cloning gets you a starting point, not a runtime.
People looking for "best practices." This config is my personal taste. It includes Chinese long-form writing, fortune-telling, desk pets, jewelry marketing — none of which probably overlap with your work.
People looking for a Claude Code tutorial. This is a reference, not a guide.

Why I think this matters

Years ago when I was first getting serious about the command line, someone asked why I spent so much time configuring vim, zsh, and tmux. "These are one-time investments, not worth it." I didn't have a great answer at the time.

Looking back now: I type fast. I grep accurately. I can ssh into any random server and feel at home. The compounding return on those configs is everywhere in my work.

~/.claude is to the agent era what ~/.vim and ~/.zshrc were to the command-line era.

It's a personal work environment that gets continuously maintained, optimized, and inherited across machines. It is not a tool. It is the shape of your tools.

When the floor for AI tool access drops to zero — when anyone can prompt their way to a working app — the differentiation moves to how you've configured the tool. Same Claude Code, different config, different output.

I didn't fully feel the weight of this until I pushed ~/.claude public this morning. Looking at the GitHub INDEX.md: 50+ skills, 50+ agents, 40+ commands, all neatly listed.

Not 50 isolated tools. A digital workshop.

I spent six months building it. It now compresses six months of work into an hour or two.

Repo: github.com/xingfanxia/ax-claude-config-public

Fork it, mock it, tell me which parts are stupid. It's not done. None of this is ever done. But it survives — and the next move is to let it evolve.

AX