Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation

November 17, 2022 · Declared Dead · 🏛 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Chunyu Qiang, Peng Yang, Hao Che, Jinba Xiao, Xiaorui Wang, Zhongyuan Wang arXiv ID 2211.09495 Category cs.SD: Sound Cross-listed cs.AI, cs.CL, eess.AS Citations 6 Venue Asia-Pacific Signal and Information Processing Association Annual Summit and Conference Last Checked 4 months ago

Abstract

Conversion of Chinese Grapheme-to-Phoneme (G2P) plays an important role in Mandarin Chinese Text-To-Speech (TTS) systems, where one of the biggest challenges is the task of polyphone disambiguation. Most of the previous polyphone disambiguation models are trained on manually annotated datasets, and publicly available datasets for polyphone disambiguation are scarce. In this paper we propose a simple back-translation-style data augmentation method for mandarin Chinese polyphone disambiguation, utilizing a large amount of unlabeled text data. Inspired by the back-translation technique proposed in the field of machine translation, we build a Grapheme-to-Phoneme (G2P) model to predict the pronunciation of polyphonic character, and a Phoneme-to-Grapheme (P2G) model to predict pronunciation into text. Meanwhile, a window-based matching strategy and a multi-model scoring strategy are proposed to judge the correctness of the pseudo-label. We design a data balance strategy to improve the accuracy of some typical polyphonic characters in the training set with imbalanced distribution or data scarcity. The experimental result shows the effectiveness of the proposed back-translation-style data augmentation method.