FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation

December 22, 2024 Β· Declared Dead Β· πŸ› Computer Vision and Pattern Recognition

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Tianyun Zhong, Chao Liang, Jianwen Jiang, Gaojie Lin, Jiaqi Yang, Zhou Zhao arXiv ID 2412.16915 Category cs.CV: Computer Vision Cross-listed cs.AI, cs.GR, cs.SD, eess.AS Citations 5 Venue Computer Vision and Pattern Recognition Last Checked 4 months ago
Abstract
Diffusion-based audio-driven talking avatar methods have recently gained attention for their high-fidelity, vivid, and expressive results. However, their slow inference speed limits practical applications. Despite the development of various distillation techniques for diffusion models, we found that naive diffusion distillation methods do not yield satisfactory results. Distilled models exhibit reduced robustness with open-set input images and a decreased correlation between audio and video compared to teacher models, undermining the advantages of diffusion models. To address this, we propose FADA (Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation). We first designed a mixed-supervised loss to leverage data of varying quality and enhance the overall model capability as well as robustness. Additionally, we propose a multi-CFG distillation with learnable tokens to utilize the correlation between audio and reference image conditions, reducing the threefold inference runs caused by multi-CFG with acceptable quality degradation. Extensive experiments across multiple datasets show that FADA generates vivid videos comparable to recent diffusion model-based methods while achieving an NFE speedup of 4.17-12.5 times. Demos are available at our webpage http://fadavatar.github.io.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Computer Vision

πŸŒ… πŸŒ… Old Age

Fast R-CNN

Ross Girshick

cs.CV πŸ› ICCV πŸ“š 27.7K cites 11 years ago

Died the same way β€” πŸ‘» Ghosted