DENOASR: Debiasing ASRs through Selective Denoising

October 22, 2024 · Declared Dead · 🏛 2024 IEEE International Conference on Knowledge Graph (ICKG)

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Anand Kumar Rai, Siddharth D Jaiswal, Shubham Prakash, Bendi Pragnya Sree, Animesh Mukherjee arXiv ID 2410.16712 Category cs.SD: Sound Cross-listed cs.CL, eess.AS Citations 3 Venue 2024 IEEE International Conference on Knowledge Graph (ICKG) Last Checked 3 months ago

Abstract

Automatic Speech Recognition (ASR) systems have been examined and shown to exhibit biases toward particular groups of individuals, influenced by factors such as demographic traits, accents, and speech styles. Noise can disproportionately impact speakers with certain accents, dialects, or speaking styles, leading to biased error rates. In this work, we introduce a novel framework DENOASR, which is a selective denoising technique to reduce the disparity in the word error rates between the two gender groups, male and female. We find that a combination of two popular speech denoising techniques, viz. DEMUCS and LE, can be effectively used to mitigate ASR disparity without compromising their overall performance. Experiments using two state-of-the-art open-source ASRs - OpenAI WHISPER and NVIDIA NEMO - on multiple benchmark datasets, including TIE, VOX-POPULI, TEDLIUM, and FLEURS, show that there is a promising reduction in the average word error rate gap across the two gender groups. For a given dataset, the denoising is selectively applied on speech samples having speech intelligibility below a certain threshold, estimated using a small validation sample, thus ameliorating the need for large-scale human-written ground-truth transcripts. Our findings suggest that selective denoising can be an elegant approach to mitigate biases in present-day ASR systems.