Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects

August 02, 2018 ยท Declared Dead ยท ๐Ÿ› Interspeech

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa arXiv ID 1808.00665 Category eess.AS: Audio & Speech Cross-listed cs.CL, cs.SD, stat.ML Citations 17 Venue Interspeech Last Checked 2 months ago
Abstract
We investigated the impact of noisy linguistic features on the performance of a Japanese speech synthesis system based on neural network that uses WaveNet vocoder. We compared an ideal system that uses manually corrected linguistic features including phoneme and prosodic information in training and test sets against a few other systems that use corrupted linguistic features. Both subjective and objective results demonstrate that corrupted linguistic features, especially those in the test set, affected the ideal system's performance significantly in a statistical sense due to a mismatched condition between the training and test sets. Interestingly, while an utterance-level Turing test showed that listeners had a difficult time differentiating synthetic speech from natural speech, it further indicated that adding noise to the linguistic features in the training set can partially reduce the effect of the mismatch, regularize the model, and help the system perform better when linguistic features of the test set are noisy.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Audio & Speech

Died the same way โ€” ๐Ÿ‘ป Ghosted