Self-Supervised Learning of Context-Aware Pitch Prosody Representations

July 17, 2020 ยท Declared Dead ยท + Add venue

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Camille Noufi, Prateek Verma arXiv ID 2007.09060 Category cs.SD: Sound Cross-listed cs.CV, cs.IR, cs.LG, eess.AS Citations 1 Last Checked 4 months ago
Abstract
In music and speech, meaning is derived at multiple levels of context. Affect, for example, can be inferred both by a short sound token and by sonic patterns over a longer temporal window such as an entire recording. In this letter, we focus on inferring meaning from this dichotomy of contexts. We show how contextual representations of short sung vocal lines can be implicitly learned from fundamental frequency ($F_0$) and thus be used as a meaningful feature space for downstream Music Information Retrieval (MIR) tasks. We propose three self-supervised deep learning paradigms which leverage pseudotask learning of these two levels of context to produce latent representation spaces. We evaluate the usefulness of these representations by embedding unseen pitch contours into each space and conducting downstream classification tasks. Our results show that contextual representation can enhance downstream classification by as much as 15\% as compared to using traditional statistical contour features.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Sound

Died the same way โ€” ๐Ÿ‘ป Ghosted