Harmony-Aware Music-driven Motion Synthesis with Perceptual Constraint on UGC Datasets
June 08, 2025 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Xinyi Wu, Haohong Wang, Aggelos K. Katsaggelos
arXiv ID
2506.07076
Category
cs.MM: Multimedia
Citations
0
Venue
arXiv.org
Last Checked
4 months ago
Abstract
With the popularity of video-based user-generated content (UGC) on social media, harmony, as dictated by human perceptual principles, is critical in assessing the rhythmic consistency of audio-visual UGCs for better user engagement. In this work, we propose a novel harmony-aware GAN framework, following a specifically designed harmony evaluation strategy to enhance rhythmic synchronization in the automatic music-to-motion synthesis using a UGC dance dataset. This harmony strategy utilizes refined cross-modal beat detection to capture closely correlated audio and visual rhythms in an audio-visual pair. To mimic human attention mechanism, we introduce saliency-based beat weighting and interval-driven beat alignment, which ensures accurate harmony score estimation consistent with human perception. Building on this strategy, our model, employing efficient encoder-decoder and depth-lifting designs, is adversarially trained based on categorized musical meter segments to generate realistic and rhythmic 3D human motions. We further incorporate our harmony evaluation strategy as a weakly supervised perceptual constraint to flexibly guide the synchronized audio-visual rhythms during the generation process. Experimental results show that our proposed model significantly outperforms other leading music-to-motion methods in rhythmic harmony, both quantitatively and qualitatively, even with limited UGC training data. Live samples 15 can be watched at: https://youtu.be/tWwz7yq4aUs
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Multimedia
π
π
Old Age
R.I.P.
π»
Ghosted
Viewport-Adaptive Navigable 360-Degree Video Delivery
π
π
The Cartographer
A Comprehensive Survey on Cross-modal Retrieval
π
π
The Cartographer
An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges
R.I.P.
π»
Ghosted
A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding
R.I.P.
π»
Ghosted
Video Generation From Text
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted