Cover image

Whispers Against the Noise: How Contrastive Decoding Tames Long‑Form ASR Hallucinations

A transcript is usually treated as boring infrastructure. It sits underneath meeting summaries, call-center analytics, podcast search, earnings-call review, legal discovery, medical documentation, and the cheerful dashboard that tells managers everything is now “AI-powered.” Then the transcript invents a sentence. Not a typo. Not a small mishearing. A fluent, confident, context-shaped sentence that nobody said. In short clips, this is irritating. In long recordings, it becomes structural. One bad segment can become context for the next segment; the next segment inherits the mistake; and soon the system is not transcribing a recording so much as continuing a badly seeded story. ...

March 10, 2026 · 14 min · Zelina
Cover image

When 30 Seconds Isn’t Enough: Engineering Long-Form Bangla ASR & Diarization

Call recordings are rude. They do not arrive in clean 15-second snippets. They run for minutes or hours. Speakers interrupt each other. Background noise leaks in. Someone moves away from the microphone. Someone else speaks over music, traffic, or a ceiling fan that apparently believes it deserves co-author status. This is where many speech AI demos quietly stop being impressive. ...

March 1, 2026 · 12 min · Zelina