Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement

November 12, 2020 ยท Declared Dead ยท ๐Ÿ› arXiv.org

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Hamed Hemati, Damian Borth arXiv ID 2011.06392 Category cs.SD: Sound Cross-listed cs.LG, eess.AS Citations 11 Venue arXiv.org Last Checked 3 months ago
Abstract
Recent neural Text-to-Speech (TTS) models have been shown to perform very well when enough data is available. However, fine-tuning them for new speakers or languages is not straightforward in a low-resource setup. In this paper, we show that by applying minor modifications to a Tacotron model, one can transfer an existing TTS model for new speakers from the same or a different language using only 20 minutes of data. For this purpose, we first introduce a base multi-lingual Tacotron with language-agnostic input, then demonstrate how transfer learning is done for different scenarios of speaker adaptation without exploiting any pre-trained speaker encoder or code-switching technique. We evaluate the transferred model in both subjective and objective ways.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Sound

Died the same way โ€” ๐Ÿ‘ป Ghosted