Does Speech enhancement of publicly available data help build robust Speech Recognition Systems?

October 29, 2019 Β· Declared Dead Β· πŸ› AAAI Conference on Artificial Intelligence

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Bhavya Ghai, Buvana Ramanan, Klaus Mueller arXiv ID 1910.13488 Category eess.AS: Audio & Speech Cross-listed cs.CL, cs.SD Citations 1 Venue AAAI Conference on Artificial Intelligence Last Checked 3 months ago
Abstract
Automatic speech recognition (ASR) systems play a key role in many commercial products including voice assistants. Typically, they require large amounts of clean speech data for training which gives an undue advantage to large organizations which have tons of private data. In this paper, we have first curated a fairly big dataset using publicly available data sources. Thereafter, we tried to investigate if we can use publicly available noisy data to train robust ASR systems. We have used speech enhancement to clean the noisy data first and then used it together with its cleaned version to train ASR systems. We have found that using speech enhancement gives 9.5\% better word error rate than training on just noisy data and 9\% better than training on just clean data. It's performance is also comparable to the ideal case scenario when trained on noisy and its clean version.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Audio & Speech

Died the same way β€” πŸ‘» Ghosted