A Survey on Multimodal Benchmarks: In the Era of Large AI Models
September 21, 2024 Β· The Cartographer Β· π arXiv.org
"No code URL or promise found in abstract"
"Title-pattern auto-detect: A Survey on Multimodal Benchmarks: In the Era of Large AI Models"
Evidence collected by the PWNC Scanner
Authors
Lin Li, Guikun Chen, Hanrong Shi, Jun Xiao, Long Chen
arXiv ID
2409.18142
Category
cs.AI: Artificial Intelligence
Cross-listed
cs.MM
Citations
25
Venue
arXiv.org
Last Checked
2 days ago
Abstract
The rapid evolution of Multimodal Large Language Models (MLLMs) has brought substantial advancements in artificial intelligence, significantly enhancing the capability to understand and generate multimodal content. While prior studies have largely concentrated on model architectures and training methodologies, a thorough analysis of the benchmarks used for evaluating these models remains underexplored. This survey addresses this gap by systematically reviewing 211 benchmarks that assess MLLMs across four core domains: understanding, reasoning, generation, and application. We provide a detailed analysis of task designs, evaluation metrics, and dataset constructions, across diverse modalities. We hope that this survey will contribute to the ongoing advancement of MLLM research by offering a comprehensive overview of benchmarking practices and identifying promising directions for future work. An associated GitHub repository collecting the latest papers is available.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Artificial Intelligence
π
π
The Cartographer
R.I.P.
π»
Ghosted
Explanation in Artificial Intelligence: Insights from the Social Sciences
R.I.P.
π»
Ghosted
Federated Machine Learning: Concept and Applications
R.I.P.
π»
Ghosted
Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR
R.I.P.
π»
Ghosted
DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks
R.I.P.
π»
Ghosted