Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection

November 26, 2024 · Declared Dead · 🏛 Spoken Language Technology Workshop

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Tzu-Ting Yang, Hsin-Wei Wang, Yi-Cheng Wang, Berlin Chen arXiv ID 2412.08651 Category eess.AS: Audio & Speech Cross-listed cs.CL, cs.LG, cs.SD Citations 0 Venue Spoken Language Technology Workshop Last Checked 3 months ago

Abstract

Code-switching-where multilingual speakers alternately switch between languages during conversations-still poses significant challenges to end-to-end (E2E) automatic speech recognition (ASR) systems due to phenomena of both acoustic and semantic confusion. This issue arises because ASR systems struggle to handle the rapid alternation of languages effectively, which often leads to significant performance degradation. Our main contributions are at least threefold: First, we incorporate language identification (LID) information into several intermediate layers of the encoder, aiming to enrich output embeddings with more detailed language information. Secondly, through the novel application of language boundary alignment loss, the subsequent ASR modules are enabled to more effectively utilize the knowledge of internal language posteriors. Third, we explore the feasibility of using language posteriors to facilitate deep interaction between shared encoder and language-specific encoders. Through comprehensive experiments on the SEAME corpus, we have verified that our proposed method outperforms the prior-art method, disentangle based mixture-of-experts (D-MoE), further enhancing the acuity of the encoder to languages.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Audio & Speech

R.I.P. 👻 Ghosted

ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection

Massimiliano Todisco, Xin Wang, ... (+8 more)

eess.AS 🏛 Interspeech 📚 736 cites 7 years ago

R.I.P. 👻 Ghosted

LPCNet: Improving Neural Speech Synthesis Through Linear Prediction

Jean-Marc Valin, Jan Skoglund

eess.AS 🏛 ICASSP 📚 489 cites 7 years ago

R.I.P. 👻 Ghosted

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

Quan Wang, Hannah Muckenhirn, ... (+8 more)

eess.AS 🏛 Interspeech 📚 413 cites 7 years ago

R.I.P. 👻 Ghosted

TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech

Andy T. Liu, Shang-Wen Li, Hung-yi Lee

eess.AS 🏛 IEEE/ACM TASLP 📚 397 cites 5 years ago

R.I.P. 👻 Ghosted

Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders

Andy T. Liu, Shu-wen Yang, ... (+3 more)

eess.AS 🏛 ICASSP 📚 393 cites 6 years ago

R.I.P. 👻 Ghosted

Utterance-level Aggregation For Speaker Recognition In The Wild

Weidi Xie, Arsha Nagrani, ... (+2 more)

eess.AS 🏛 ICASSP 📚 365 cites 7 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 8 years ago