Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms

October 11, 2023 · Declared Dead · 🏛 Synthetic Data’s Transformative Role in Foundational Speech Models

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Joseph Konan, Shikhar Agnihotri, Ojas Bhargave, Shuo Han, Yunyang Zeng, Ankit Shah, Bhiksha Raj arXiv ID 2310.07161 Category cs.SD: Sound Cross-listed cs.CL, eess.AS Citations 1 Venue Synthetic Data’s Transformative Role in Foundational Speech Models Last Checked 4 months ago

Abstract

Within the ambit of VoIP (Voice over Internet Protocol) telecommunications, the complexities introduced by acoustic transformations merit rigorous analysis. This research, rooted in the exploration of proprietary sender-side denoising effects, meticulously evaluates platforms such as Google Meets and Zoom. The study draws upon the Deep Noise Suppression (DNS) 2020 dataset, ensuring a structured examination tailored to various denoising settings and receiver interfaces. A methodological novelty is introduced via Blinder-Oaxaca decomposition, traditionally an econometric tool, repurposed herein to analyze acoustic-phonetic perturbations within VoIP systems. To further ground the implications of these transformations, psychoacoustic metrics, specifically PESQ and STOI, were used to explain of perceptual quality and intelligibility. Cumulatively, the insights garnered underscore the intricate landscape of VoIP-influenced acoustic dynamics. In addition to the primary findings, a multitude of metrics are reported, extending the research purview. Moreover, out-of-domain benchmarking for both time and time-frequency domain speech enhancement models is included, thereby enhancing the depth and applicability of this inquiry.