Whispers Against the Noise: How Contrastive Decoding Tames Long‑Form ASR Hallucinations
Opening — Why this matters now Speech recognition quietly sits at the center of modern AI infrastructure. Meetings are transcribed, podcasts indexed, customer calls summarized, and voice interfaces embedded in everything from smartphones to factory dashboards. But there is an awkward secret in the industry: long recordings break speech models. Even state‑of‑the‑art systems such as Whisper can produce fluent—but entirely fabricated—sentences when transcribing extended audio. These hallucinations often appear during silence, noisy segments, or when context from earlier transcription segments propagates errors forward. ...