Lyrics-to-Audio Alignment by Unsupervised Discovery of Repetitive Patterns in Vowel Acoustics

January 21, 2017 ยท Declared Dead ยท ๐Ÿ› IEEE Access

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Sungkyun Chang, Kyogu Lee arXiv ID 1701.06078 Category cs.SD: Sound Cross-listed cs.AI, cs.IR, cs.LG, eess.AS Citations 10 Venue IEEE Access Last Checked 3 months ago
Abstract
Most of the previous approaches to lyrics-to-audio alignment used a pre-developed automatic speech recognition (ASR) system that innately suffered from several difficulties to adapt the speech model to individual singers. A significant aspect missing in previous works is the self-learnability of repetitive vowel patterns in the singing voice, where the vowel part used is more consistent than the consonant part. Based on this, our system first learns a discriminative subspace of vowel sequences, based on weighted symmetric non-negative matrix factorization (WS-NMF), by taking the self-similarity of a standard acoustic feature as an input. Then, we make use of canonical time warping (CTW), derived from a recent computer vision technique, to find an optimal spatiotemporal transformation between the text and the acoustic sequences. Experiments with Korean and English data sets showed that deploying this method after a pre-developed, unsupervised, singing source separation achieved more promising results than other state-of-the-art unsupervised approaches and an existing ASR-based system.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Sound

Died the same way โ€” ๐Ÿ‘ป Ghosted