Unified Pretraining Target Based Video-music Retrieval With Music Rhythm And Video Optical Flow Information
September 18, 2023 Β· Declared Dead Β· π IEEE International Conference on Acoustics, Speech, and Signal Processing
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Tianjun Mao, Shansong Liu, Yunxuan Zhang, Dian Li, Ying Shan
arXiv ID
2309.09421
Category
cs.MM: Multimedia
Citations
3
Venue
IEEE International Conference on Acoustics, Speech, and Signal Processing
Last Checked
3 months ago
Abstract
Background music (BGM) can enhance the video's emotion. However, selecting an appropriate BGM often requires domain knowledge. This has led to the development of video-music retrieval techniques. Most existing approaches utilize pretrained video/music feature extractors trained with different target sets to obtain average video/music-level embeddings. The drawbacks are two-fold. One is that different target sets for video/music pretraining may cause the generated embeddings difficult to match. The second is that the underlying temporal correlation between video and music is ignored. In this paper, our proposed approach leverages a unified target set to perform video/music pretraining and produces clip-level embeddings to preserve temporal information. The downstream cross-modal matching is based on the clip-level features with embedded music rhythm and optical flow information. Experiments demonstrate that our proposed method can achieve superior performance over the state-of-the-art methods by a significant margin.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Multimedia
π
π
Old Age
R.I.P.
π»
Ghosted
Viewport-Adaptive Navigable 360-Degree Video Delivery
π
π
The Cartographer
A Comprehensive Survey on Cross-modal Retrieval
π
π
The Cartographer
An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges
R.I.P.
π»
Ghosted
A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding
R.I.P.
π»
Ghosted
Video Generation From Text
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted