DreamTalk: When Emotional Talking Head Generation Meets Diffusion Probabilistic Models

December 15, 2023 ยท Entered Twilight ยท + Add venue

๐Ÿ’ค TWILIGHT: Eternal Rest
Repo abandoned since publication

Repo contents: LICENSE, README.md, checkpoints, configs, core, data, generators, inference_for_demo_video.py, media, output_video, requirements.txt, tmp

Authors Yifeng Ma, Shiwei Zhang, Jiayu Wang, Xiang Wang, Yingya Zhang, Zhidong Deng arXiv ID 2312.09767 Category cs.CV: Computer Vision Citations 8 Repository https://github.com/ali-vilab/dreamtalk โญ 1793 Last Checked 2 months ago
Abstract
Emotional talking head generation has attracted growing attention. Previous methods, which are mainly GAN-based, still struggle to consistently produce satisfactory results across diverse emotions and cannot conveniently specify personalized emotions. In this work, we leverage powerful diffusion models to address the issue and propose DreamTalk, a framework that employs meticulous design to unlock the potential of diffusion models in generating emotional talking heads. Specifically, DreamTalk consists of three crucial components: a denoising network, a style-aware lip expert, and a style predictor. The diffusion-based denoising network can consistently synthesize high-quality audio-driven face motions across diverse emotions. To enhance lip-motion accuracy and emotional fullness, we introduce a style-aware lip expert that can guide lip-sync while preserving emotion intensity. To more conveniently specify personalized emotions, a diffusion-based style predictor is utilized to predict the personalized emotion directly from the audio, eliminating the need for extra emotion reference. By this means, DreamTalk can consistently generate vivid talking faces across diverse emotions and conveniently specify personalized emotions. Extensive experiments validate DreamTalk's effectiveness and superiority. The code is available at https://github.com/ali-vilab/dreamtalk.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computer Vision