My Transcriber Heard Five People in a Two-Person Conversation
Transcription is easy. The hard half is knowing who spoke. Chasing accurate speaker diarization, I ran Gemini chunking, Senko, and Doubao's auc models into the ground before landing on Volcano's 妙记 (Lark Minutes ASR) — which nailed the speaker count on every two-person recording. Plus the gotchas: Volcano's two auth systems and a cross-border upload throttled to 34 KB/s.