A Comprehensive Survey on Generative AI for Video-to-Music Generation
February 18, 2025 ยท The Cartographer ยท ๐ arXiv.org
"No code URL or promise found in abstract"
"Title-pattern auto-detect: A Comprehensive Survey on Generative AI for Video-to-Music Generation"
Evidence collected by the PWNC Scanner
Authors
Shulei Ji, Songruoyao Wu, Zihao Wang, Shuyu Li, Kejun Zhang
arXiv ID
2502.12489
Category
eess.AS: Audio & Speech
Cross-listed
cs.AI,
cs.MM
Citations
5
Venue
arXiv.org
Last Checked
3 days ago
Abstract
The burgeoning growth of video-to-music generation can be attributed to the ascendancy of multimodal generative models. However, there is a lack of literature that comprehensively combs through the work in this field. To fill this gap, this paper presents a comprehensive review of video-to-music generation using deep generative AI techniques, focusing on three key components: conditioning input construction, conditioning mechanism, and music generation frameworks. We categorize existing approaches based on their designs for each component, clarifying the roles of different strategies. Preceding this, we provide a fine-grained categorization of video and music modalities, illustrating how different categories influence the design of components within the generation pipelines. Furthermore, we summarize available multimodal datasets and evaluation metrics while highlighting ongoing challenges in the field.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Audio & Speech
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
LPCNet: Improving Neural Speech Synthesis Through Linear Prediction
R.I.P.
๐ป
Ghosted
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
R.I.P.
๐ป
Ghosted
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
R.I.P.
๐ป
Ghosted
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
R.I.P.
๐ป
Ghosted