Video Diffusion

TL;DR for operators OmniAvatar is best read as a shift from “make the mouth move” to “make the person perform.” The paper introduces an audio-driven avatar video generation system that takes a reference image, an audio clip, and a text prompt, then generates facial and semi-body video with synchronised speech, adaptive body motion, and prompt-controlled scene elements.1 ...