A Cat Entertainer

A Cat Entertainer, Just A Tech Blog

Blog
Series
RadarAI Radar
Tokens
Media
About

Claude Code (56)
Agents (135)
DevOps (12)
Mobile (2)
type:builder-log (36)
Rust (2)
Swift (2)
macOS (7)
AI (277)
Prompt Engineering (8)
type:essay (66)
教育 (1)
学习动机 (1)
Alignment (2)
认知科学 (1)
theme:teach-ai-zh (7)
Education (1)
Motivation (1)
Cognitive Science (1)
创业 (1)
商业方法论 (1)
增长 (1)
定价 (1)
认知 (1)
theme:investor (30)
Entrepreneurship (1)
Business (3)
Growth (1)
Pricing (1)
Cognition (1)
财富分配 (2)
技术革命 (2)
分析框架 (1)
type:research (9)
Wealth Distribution (2)
Tech Revolutions (2)
Framework (1)
铁路 (1)
电力 (1)
互联网 (1)
AI Infrastructure (4)
Railroads (1)
Electricity (1)
Internet (1)
软件开发 (4)
theme:deep-dive (13)
Software Development (28)
Workflows (2)
投资 (11)
AI 基础设施 (1)
反身性 (1)
投资研究 (1)
Investing (12)
Reflexivity (1)
Investment Research (1)
职场 (2)
升职 (1)
AI 提效 (1)
认知升级 (1)
Career (5)
Promotion (1)
Future of Work (12)
Management (1)
具身智能 (1)
机器人 (1)
未来社会 (1)
UBI (2)
赛博朋克 (1)
Embodied AI (1)
Robotics (1)
硬件 (11)
半导体 (5)
晶体管 (1)
电子管 (1)
theme:taming-electrons (10)
Hardware (13)
Semiconductors (7)
Transistor (1)
Vacuum Tube (1)
CPU (2)
寄存器 (1)
缓存 (1)
Registers (1)
Cache (1)
存储 (1)
硬盘 (1)
SSD (2)
RAID (2)
Storage (2)
Hard Drive (1)
内存 (7)
显存 (1)
HBM (6)
GPU (5)
Memory (11)
VRAM (1)
光刻机 (1)
量子点 (1)
AR (2)
Lithography (1)
Quantum Dots (1)
语音转写 (2)
说话人分离 (1)
火山引擎 (1)
自动化 (2)
Speech to Text (1)
Speaker Diarization (1)
Volcano Engine (1)
Automation (3)
AI 编程 (1)
Agentic Coding (1)
编程语言 (1)
计算架构 (1)
product:compute-labs (6)
theme:memory-stack (6)
CXL (1)
China (4)
Organization (10)
theme:philosophy (8)
Game Development (2)
Godot (2)
Unity (2)
Unreal (2)
Skills (7)
Game Design (7)
type:skill-workshop (2)
战锤40K (5)
Warhammer 40K (10)
科幻 (5)
世界观 (9)
设定考据 (9)
NAND (2)
Warhammer (9)
Science Fiction (5)
Worldbuilding (9)
Lore (9)
DRAM (2)
计算机体系结构 (1)
Compute (5)
战锤奇幻 (4)
Warhammer Fantasy (8)
奇幻 (4)
Fantasy (4)
游戏剧本 (1)
创作 (1)
占卜 (1)
开发 (1)
AI Skills (2)
Tools (4)
Open Source (6)
Workflow (6)
type:skill-config (2)
type:tool (5)
Product (22)
Engineering (20)
product:shichuan (6)
Design (12)
UI (4)
System Design (4)
product:radar (4)
MCP (4)
AI Agents (27)
多 Agent (6)
产品设计 (4)
游戏 (1)
product:agora (7)
开源 (6)
架构决策 (1)
AgentScope (1)
架构设计 (1)
Agora (3)
LLM (12)
狼人杀 (2)
product:game-producer (6)
AI Workflow (2)
研究 (1)
公众舆论 (1)
Research (1)
Public Opinion (1)
Workplace (1)
桌面宠物 (1)
Clawd (10)
product:clawd (10)
Desktop Pet (1)
Product Design (2)
人格系统 (1)
Personality System (1)
记忆系统 (1)
Memory Systems (1)
架构 (1)
Electron (2)
Architecture (1)
思维模型 (11)
Mental Models (10)
Agent Fleet (6)
开发工具 (2)
Knowledge Management (2)
Obsidian (2)
theme:runtime (9)
SaaS (2)
AI Safety (2)
Interpretability (2)
Anthropic (2)
AI Companion (2)
Foundry (2)
product:foundry (2)
工具 (2)
会议录音 (2)
whisper (1)
Easter Eggs (2)
Fun (2)
Agent Runtimes (3)
Frontend (3)
Philosophy (2)
WeChat (1)
Chips (2)
Writing (8)
NLP (6)
TPU (2)
Mio (66)
theme:soul-framework (10)
Lumi (8)
Chinese (2)
phase:rethink (6)
product:elan (10)
Voice (8)
product:openclaw (8)
theme:runbook (8)
产品测评 (1)
Linux (8)
VPN (4)
翻墙 (2)
科学上网 (2)
Self-hosting (2)
Economics (4)
Psychology (3)
Labor (2)
phase:rebuild (12)
Claude (4)
GPT (2)
phase:manifesto (2)
phase:evolve (16)
phase:research (2)
OpenClaw (8)
TTS (2)
product:work-agents (2)
GCP (4)
phase:foundation (18)
Ops (2)
Cost Optimization (2)
Agent Teams (4)
PanPanMao (4)
theme:vision (20)
product:panpanmao (20)
type:manifesto (2)
Apple Watch (1)
微信 (1)
AppleScript (1)
Python (1)
Best Practices (1)

Benchmark 分数高又怎样

Mar 5, 2026

GPT 5.4 在各项 benchmark 上全面领先。但当我把同一个复杂的产品战略问题扔给两个模型时，benchmark 分数和真实输出质量之间的鸿沟令人震惊。

AI LLM Claude GPT type:essay

GPT 5.4 vs Opus 4.6: Why Benchmarks Stopped Mattering

Mar 5, 2026

GPT 5.4 dominates every benchmark. But when I gave both models the same complex product strategy prompt, the gap between benchmark scores and real-world output was staggering. Here's what actually happened.

AI LLM Claude GPT type:essay

RSS Changelog