BAGEL: Benchmarking Animal Knowledge Expertise in Language Models

April 17, 2026 ยท Grace Period ยท + Add venue

โณ Grace Period
This paper is less than 90 days old. We give authors time to release their code before passing judgment.
Authors Jiacheng Shen, Masato Hagiwara, Milad Alizadeh, Ellen Gilsenan-McMahon, Marius Miron, David Robinson, Emmanuel Chemla, Sara Keen, Gagan Narula, Mathieu Lauriรจre, Matthieu Geist, Olivier Pietquin arXiv ID 2604.16241 Category cs.CL: Computation & Language Cross-listed cs.AI Citations 0
Abstract
Large language models have shown strong performance on broad-domain knowledge and reasoning benchmarks, but it remains unclear how well language models handle specialized animal-related knowledge under a unified closed-book evaluation protocol. We introduce BAGEL, a benchmark for evaluating animal knowledge expertise in language models. BAGEL is constructed from diverse scientific and reference sources, including bioRxiv, Global Biotic Interactions, Xeno-canto, and Wikipedia, using a combination of curated examples and automatically generated closed-book question-answer pairs. The benchmark covers multiple aspects of animal knowledge, including taxonomy, morphology, habitat, behavior, vocalization, geographic distribution, and species interactions. By focusing on closed-book evaluation, BAGEL measures animal-related knowledge of models without external retrieval at inference time. BAGEL further supports fine-grained analysis across source domains, taxonomic groups, and knowledge categories, enabling a more precise characterization of model strengths and systematic failure modes. Our benchmark provides a new testbed for studying domain-specific knowledge generalization in language models and for improving their reliability in biodiversity-related applications.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 9 years ago