Active Reasoning Vision-Language Models via Sequential Experimental Design

May 02, 2026 ยท Grace Period ยท ๐Ÿ› ICML 2026

โณ Grace Period
This paper is less than 90 days old. We give authors time to release their code before passing judgment.
Authors Anjie Liu, Ziqin Gong, Yan Song, Yuxiang Chen, Xiaolong Liu, Hengtong Lu, Kaike Zhang, Chen Wei arXiv ID 2605.01345 Category cs.CV: Computer Vision Cross-listed cs.AI, cs.LG Citations 0 Venue ICML 2026
Abstract
Visual perception in modern Vision-Language Models (VLMs) is constrained by a fundamental perceptual bandwidth bottleneck: a broad field of view inevitably sacrifices the fine-grained details necessary for complex reasoning. Inspired by the classical paradigms of active vision and information foraging, we frame overcoming this limitation as a sequential decision-making process. We formalise this process through the lens of the sequential Bayesian optimal experimental design (S-BOED) problem. While exact Bayesian inference is intractable in continuous gigapixel spaces, we derive principled yet tractable approximations that balance spatial coverage against resolution. To validate this framework, we present a training-free inference strategy as a practical instantiation of the S-BOED objective for agents equipped with multiple vision tools. Designed as a flexible template, this strategy accommodates arbitrary optimisation algorithms, ranging from efficient greedy sampling to look-ahead planning, to approximate the optimal design. Empirical evaluations on gigapixel-level benchmarks demonstrate that our approach further boosts the performance of state-of-the-art models, significantly outperforming standard baselines and effectively narrowing the gap towards human-annotated oracles.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computer Vision

๐ŸŒ… ๐ŸŒ… Old Age

Fast R-CNN

Ross Girshick

cs.CV ๐Ÿ› ICCV ๐Ÿ“š 27.7K cites 11 years ago