A Cat Entertainer

A Cat Entertainer, Just A Tech Blog

Blog
Series
RadarAI Radar
Tokens
Media
About

AI (254)
语音转写 (2)
说话人分离 (1)
火山引擎 (1)
自动化 (2)
type:builder-log (30)
Speech to Text (1)
Speaker Diarization (1)
Volcano Engine (1)
Automation (3)
AI 编程 (1)
Agentic Coding (1)
编程语言 (1)
Software Development (24)
theme:deep-dive (5)
type:essay (45)
硬件 (6)
内存 (6)
计算架构 (1)
product:compute-labs (6)
theme:memory-stack (6)
Hardware (8)
Memory (10)
CXL (1)
China (4)
Agents (122)
Organization (10)
Future of Work (10)
theme:philosophy (8)
HBM (4)
Semiconductors (4)
Investing (11)
Game Development (2)
Godot (2)
Unity (2)
Unreal (2)
type:research (7)
GPU (3)
Skills (7)
Game Design (7)
type:skill-workshop (2)
战锤40K (5)
Warhammer 40K (10)
科幻 (5)
世界观 (9)
设定考据 (9)
NAND (2)
半导体 (2)
Storage (1)
Warhammer (9)
Science Fiction (5)
Worldbuilding (9)
Lore (9)
DRAM (2)
计算机体系结构 (1)
Compute (5)
战锤奇幻 (4)
Warhammer Fantasy (8)
奇幻 (4)
Fantasy (4)
游戏剧本 (1)
创作 (1)
占卜 (1)
开发 (1)
AI Skills (2)
Tools (4)
Open Source (6)
Claude Code (48)
Workflow (6)
type:skill-config (2)
macOS (5)
type:tool (5)
Product (22)
Engineering (20)
product:shichuan (6)
Design (12)
UI (4)
System Design (4)
product:radar (4)
MCP (4)
AI Agents (27)
多 Agent (6)
产品设计 (4)
游戏 (1)
product:agora (7)
开源 (6)
架构决策 (1)
AgentScope (1)
架构设计 (1)
Agora (3)
LLM (12)
狼人杀 (2)
product:game-producer (6)
AI Workflow (2)
研究 (1)
公众舆论 (1)
职场 (1)
Research (1)
Public Opinion (1)
Workplace (1)
桌面宠物 (1)
Clawd (10)
product:clawd (10)
Desktop Pet (1)
Product Design (2)
Prompt Engineering (7)
人格系统 (1)
Personality System (1)
记忆系统 (1)
Memory Systems (1)
架构 (1)
Electron (2)
Architecture (1)
投资 (10)
思维模型 (11)
theme:investor (20)
Mental Models (10)
Agent Fleet (6)
开发工具 (2)
Knowledge Management (2)
Obsidian (2)
theme:runtime (9)
SaaS (2)
AI Safety (2)
Interpretability (2)
Anthropic (2)
AI Companion (2)
Foundry (2)
product:foundry (2)
工具 (2)
会议录音 (2)
whisper (1)
Easter Eggs (2)
Fun (2)
Agent Runtimes (3)
Frontend (3)
Philosophy (2)
WeChat (1)
Chips (2)
Writing (8)
NLP (6)
theme:teach-ai-zh (6)
TPU (2)
Mio (66)
theme:soul-framework (10)
Lumi (8)
Chinese (2)
phase:rethink (6)
product:elan (10)
Voice (8)
product:openclaw (8)
theme:runbook (8)
产品测评 (1)
DevOps (8)
Linux (8)
VPN (4)
翻墙 (2)
科学上网 (2)
Self-hosting (2)
Career (4)
Economics (4)
Psychology (3)
Labor (2)
phase:rebuild (12)
Claude (4)
GPT (2)
phase:manifesto (2)
Business (2)
phase:evolve (16)
phase:research (2)
OpenClaw (8)
TTS (2)
product:work-agents (2)
GCP (4)
phase:foundation (18)
Ops (2)
Cost Optimization (2)
Agent Teams (4)
PanPanMao (4)
theme:vision (20)
product:panpanmao (20)
type:manifesto (2)
Apple Watch (1)
微信 (1)
AppleScript (1)
Python (1)
Best Practices (1)

My Transcriber Heard Five People in a Two-Person Conversation

Jun 17, 2026

Transcription is easy. The hard half is knowing who spoke. Chasing accurate speaker diarization, I ran Gemini chunking, Senko, and Doubao's auc models into the ground before landing on Volcano's 妙记 (Lark Minutes ASR) — which nailed the speaker count on every two-person recording. Plus the gotchas: Volcano's two auth systems and a cross-border upload throttled to 34 KB/s.

AI Speech to Text Speaker Diarization Volcano Engine Automation type:builder-log

RSS Changelog