Textless Speech-to-Music Retrieval Using Emotion Similarity

March 19, 2023 ยท Declared Dead ยท ๐Ÿ› IEEE International Conference on Acoustics, Speech, and Signal Processing

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors SeungHeon Doh, Minz Won, Keunwoo Choi, Juhan Nam arXiv ID 2303.10539 Category cs.SD: Sound Cross-listed cs.IR, cs.MM, eess.AS Citations 5 Venue IEEE International Conference on Acoustics, Speech, and Signal Processing Last Checked 3 months ago
Abstract
We introduce a framework that recommends music based on the emotions of speech. In content creation and daily life, speech contains information about human emotions, which can be enhanced by music. Our framework focuses on a cross-domain retrieval system to bridge the gap between speech and music via emotion labels. We explore different speech representations and report their impact on different speech types, including acting voice and wake-up words. We also propose an emotion similarity regularization term in cross-domain retrieval tasks. By incorporating the regularization term into training, similar speech-and-music pairs in the emotion space are closer in the joint embedding space. Our comprehensive experimental results show that the proposed model is effective in textless speech-to-music retrieval.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Sound

Died the same way โ€” ๐Ÿ‘ป Ghosted