Benchmark 分数高又怎样
GPT 5.4 在各项 benchmark 上全面领先。但当我把同一个复杂的产品战略问题扔给两个模型时,benchmark 分数和真实输出质量之间的鸿沟令人震惊。
GPT 5.4 在各项 benchmark 上全面领先。但当我把同一个复杂的产品战略问题扔给两个模型时,benchmark 分数和真实输出质量之间的鸿沟令人震惊。
GPT 5.4 dominates every benchmark. But when I gave both models the same complex product strategy prompt, the gap between benchmark scores and real-world output was staggering. Here's what actually happened.
LLM 天生话多。对陪伴产品来说这是个大问题——真人发消息不会写小作文。这是我如何在不杀死人设的前提下控制回复长度的。
LLMs love to talk. For a companion app, that's a problem — real people don't write essays when you text them. Here's how I built a hybrid system to control response length without killing personality.
第1篇(共3篇):一个大厂工程师的创业实验——用AI构建PanPanMao,一个覆盖9个产品线的中国玄学平台,29天1,134次commit,领域知识为零。
Part 1 of 3: A big-tech engineer's journey into building PanPanMao -- an AI-powered Chinese metaphysics platform with 9 verticals, 1,134 commits in 29 days, and zero domain knowledge.
第2篇(共3篇):PanPanMao的技术构建全过程,分为4个阶段——monorepo整合、测试与品牌、商业化、里程碑冲刺。1,134次commit,85个端点,9个产品线。
Part 2 of 3: The technical build of PanPanMao across 4 phases -- monorepo consolidation, testing and branding, monetization, and the milestone rush. 1,134 commits, 85 endpoints, 9 verticals.
第3篇(共3篇):构建PanPanMao的真实教训。AI弥合领域知识鸿沟(但有个大前提)、产品vs工程、AI编程工作流,还有一个人+AI做产品的时代。
Part 3 of 3: Honest lessons from building PanPanMao. AI bridging domain knowledge (with caveats), product vs engineering, the agentic coding workflow, and the era of the solo AI-augmented builder.
© Xingfan Xia 2024 - 2026 · CC BY-NC 4.0