Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection

November 26, 2024 Β· Declared Dead Β· πŸ› Spoken Language Technology Workshop

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Tzu-Ting Yang, Hsin-Wei Wang, Yi-Cheng Wang, Berlin Chen arXiv ID 2412.08651 Category eess.AS: Audio & Speech Cross-listed cs.CL, cs.LG, cs.SD Citations 0 Venue Spoken Language Technology Workshop Last Checked 3 months ago
Abstract
Code-switching-where multilingual speakers alternately switch between languages during conversations-still poses significant challenges to end-to-end (E2E) automatic speech recognition (ASR) systems due to phenomena of both acoustic and semantic confusion. This issue arises because ASR systems struggle to handle the rapid alternation of languages effectively, which often leads to significant performance degradation. Our main contributions are at least threefold: First, we incorporate language identification (LID) information into several intermediate layers of the encoder, aiming to enrich output embeddings with more detailed language information. Secondly, through the novel application of language boundary alignment loss, the subsequent ASR modules are enabled to more effectively utilize the knowledge of internal language posteriors. Third, we explore the feasibility of using language posteriors to facilitate deep interaction between shared encoder and language-specific encoders. Through comprehensive experiments on the SEAME corpus, we have verified that our proposed method outperforms the prior-art method, disentangle based mixture-of-experts (D-MoE), further enhancing the acuity of the encoder to languages.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Audio & Speech

Died the same way β€” πŸ‘» Ghosted