DSARSR: Deep Stacked Auto-encoders Enhanced Robust Speaker Recognition

July 06, 2023 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Zhifeng Wang, Chunyan Zeng, Surong Duan, Hongjie Ouyang, Hongmin Xu arXiv ID 2307.02751 Category cs.SD: Sound Cross-listed cs.CR, eess.AS Citations 0 Venue arXiv.org Last Checked 4 months ago

Abstract

Speaker recognition is a biometric modality that utilizes the speaker's speech segments to recognize the identity, determining whether the test speaker belongs to one of the enrolled speakers. In order to improve the robustness of the i-vector framework on cross-channel conditions and explore the nova method for applying deep learning to speaker recognition, the Stacked Auto-encoders are used to get the abstract extraction of the i-vector instead of applying PLDA. After pre-processing and feature extraction, the speaker and channel-independent speeches are employed for UBM training. The UBM is then used to extract the i-vector of the enrollment and test speech. Unlike the traditional i-vector framework, which uses linear discriminant analysis (LDA) to reduce dimension and increase the discrimination between speaker subspaces, this research use stacked auto-encoders to reconstruct the i-vector with lower dimension and different classifiers can be chosen to achieve final classification. The experimental results show that the proposed method achieves better performance than the state-of-the-art method.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Sound

🔮 🔮 The Ethereal

WaveNet: A Generative Model for Raw Audio

Aaron van den Oord, Sander Dieleman, ... (+7 more)

cs.SD 🏛 Speech Synthesis 📚 8.0K cites 9 years ago

R.I.P. 👻 Ghosted

Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks

Morten Kolbæk, Dong Yu, ... (+2 more)

cs.SD 🏛 IEEE/ACM TASLP 📚 763 cites 9 years ago

R.I.P. 👻 Ghosted

The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines

Jon Barker, Shinji Watanabe, ... (+2 more)

cs.SD 🏛 Interspeech 📚 714 cites 8 years ago

R.I.P. 👻 Ghosted

TasNet: time-domain audio separation network for real-time, single-channel speech separation

Yi Luo, Nima Mesgarani

cs.SD 🏛 ICASSP 📚 711 cites 8 years ago

R.I.P. 👻 Ghosted

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Soroush Mehri, Kundan Kumar, ... (+6 more)

cs.SD 🏛 ICLR 📚 619 cites 9 years ago

R.I.P. 👻 Ghosted

MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation

Li-Chia Yang, Szu-Yu Chou, Yi-Hsuan Yang

cs.SD 🏛 ISMIR 📚 493 cites 9 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 9 years ago