When CTC Training Meets Acoustic Landmarks

November 05, 2018 Β· Declared Dead Β· πŸ› IEEE International Conference on Acoustics, Speech, and Signal Processing

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Di He, Xuesong Yang, Boon Pang Lim, Yi Liang, Mark Hasegawa-Johnson, Deming Chen arXiv ID 1811.02063 Category eess.AS: Audio & Speech Cross-listed cs.AI, cs.SD Citations 11 Venue IEEE International Conference on Acoustics, Speech, and Signal Processing Last Checked 3 months ago
Abstract
Connectionist temporal classification (CTC) provides an end-to-end acoustic model (AM) training strategy. CTC learns accurate AMs without time-aligned phonetic transcription, but sometimes fails to converge, especially in resource-constrained scenarios. In this paper, the convergence properties of CTC are improved by incorporating acoustic landmarks. We tailored a new set of acoustic landmarks to help CTC training converge more rapidly and smoothly while also reducing recognition error rates. We leveraged new target label sequences mixed with both phone and manner changes to guide CTC training. Experiments on TIMIT demonstrated that CTC based acoustic models converge significantly faster and smoother when they are augmented by acoustic landmarks. The models pretrained with mixed target labels can be further finetuned, resulting in phone error rates 8.72% below baseline on TIMIT. Consistent performance gain is also observed on WSJ (a larger corpus) and reduced TIMIT (smaller). With WSJ, we are the first to succeed in verifying the effectiveness of acoustic landmark theory on a mid-sized ASR task.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Audio & Speech

Died the same way β€” πŸ‘» Ghosted