Cover image

The Assistant Should Not Stop Watching to Speak

TL;DR for operators Live video assistants have a simple embarrassment problem: many of them stop watching while they talk. That is fine for a demo clip and disastrous for anything pretending to be real-time. The LyraV paper is useful because it treats this as a systems-control problem, not as a leaderboard beauty contest. The authors introduce Streaming Video-Language Synchrony: instead of processing frames, pausing, decoding a full sentence, and then resuming perception, the assistant interleaves incoming video frames with small chunks of generated tokens.1 The operational goal is not “say more words.” It is “keep seeing while speaking.” ...

June 29, 2026 · 19 min · Zelina