VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech

April 19, 2026 ยท Grace Period ยท ๐Ÿ› INTERSPEECH 2026

โณ Grace Period
This paper is less than 90 days old. We give authors time to release their code before passing judgment.
Authors Yi-Cheng Lin, Yusuke Hirota, Sung-Feng Huang, Hung-yi Lee arXiv ID 2604.17248 Category eess.AS: Audio & Speech Cross-listed cs.CL, cs.SD Citations 0 Venue INTERSPEECH 2026
Abstract
Large Audio-Language Models (LALMs) are increasingly integrated into daily applications, yet their generative biases remain underexplored. Existing speech fairness benchmarks rely on synthetic speech and Multiple-Choice Questions (MCQs), both offering a fragmented view of fairness. We propose VIBE, a framework that evaluates generative bias through open-ended tasks such as personalized recommendations, using real-world human recordings. Unlike MCQs, our method allows stereotypical associations to manifest organically without predefined options, making it easily extensible to new tasks. Evaluating 11 state-of-the-art LALMs reveals systematic biases in realistic scenarios. We find that gender cues often trigger larger distributional shifts than accent cues, indicating that current LALMs reproduce social stereotypes.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Audio & Speech