Demographic and Linguistic Bias Evaluation in Omnimodal Language Models

April 11, 2026 · Grace Period · 🏛 ICPR 2026

Authors Alaa Elobaid arXiv ID 2604.10014 Category cs.CV: Computer Vision Cross-listed cs.AI, cs.CL Citations 0 Venue ICPR 2026

Abstract

This paper provides a comprehensive evaluation of demographic and linguistic biases in omnimodal language models that process text, images, audio, and video within a single framework. Although these models are being widely deployed, their performance across different demographic groups and modalities is not well studied. Four omnimodal models are evaluated on tasks that include demographic attribute estimation, identity verification, activity recognition, multilingual speech transcription, and language identification. Accuracy differences are measured across age, gender, skin tone, language, and country of origin. The results show that image and video understanding tasks generally exhibit better performance with smaller demographic disparities. In contrast, audio understanding tasks exhibit significantly lower performance and substantial bias, including large accuracy differences across age groups, genders, and languages, and frequent prediction collapse toward narrow categories. These findings highlight the importance of evaluating fairness across all supported modalities as omnimodal language models are increasingly used in real-world applications.