CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling

December 08, 2023 ยท Declared Dead ยท ๐Ÿ› ECCV Workshops

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Ruihan Yang, Hannes Gamper, Sebastian Braun arXiv ID 2312.05412 Category cs.LG: Machine Learning Cross-listed cs.CV, cs.MM, cs.SD, eess.AS Citations 6 Venue ECCV Workshops Last Checked 4 months ago
Abstract
We introduce a multi-modal diffusion model tailored for the bi-directional conditional generation of video and audio. We propose a joint contrastive training loss to improve the synchronization between visual and auditory occurrences. We present experiments on two datasets to evaluate the efficacy of our proposed model. The assessment of generation quality and alignment performance is carried out from various angles, encompassing both objective and subjective metrics. Our findings demonstrate that the proposed model outperforms the baseline in terms of quality and generation speed through introduction of our novel cross-modal easy fusion architectural block. Furthermore, the incorporation of the contrastive loss results in improvements in audio-visual alignment, particularly in the high-correlation video-to-audio generation task.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning

Died the same way โ€” ๐Ÿ‘ป Ghosted