EfficientEQA: An Efficient Approach to Open-Vocabulary Embodied Question Answering

October 26, 2024 · Declared Dead · 🏛 IEEE/RJS International Conference on Intelligent RObots and Systems

Repo contents: LICENSE, README.md

Authors Kai Cheng, Zhengyuan Li, Xingpeng Sun, Byung-Cheol Min, Amrit Singh Bedi, Aniket Bera arXiv ID 2410.20263 Category cs.RO: Robotics Cross-listed cs.AI, cs.CV Citations 10 Venue IEEE/RJS International Conference on Intelligent RObots and Systems Repository https://github.com/chengkaiAcademyCity/EfficientEQA ⭐ 1 Last Checked 1 month ago

Abstract

Embodied Question Answering (EQA) is an essential yet challenging task for robot assistants. Large vision-language models (VLMs) have shown promise for EQA, but existing approaches either treat it as static video question answering without active exploration or restrict answers to a closed set of choices. These limitations hinder real-world applicability, where a robot must explore efficiently and provide accurate answers in open-vocabulary settings. To overcome these challenges, we introduce EfficientEQA, a novel framework that couples efficient exploration with free-form answer generation. EfficientEQA features three key innovations: (1) Semantic-Value-Weighted Frontier Exploration (SFE) with Verbalized Confidence (VC) from a black-box VLM to prioritize semantically important areas to explore, enabling the agent to gather relevant information faster; (2) a BLIP relevancy-based mechanism to stop adaptively by flagging highly relevant observations as outliers to indicate whether the agent has collected enough information; and (3) a Retrieval-Augmented Generation (RAG) method for the VLM to answer accurately based on pertinent images from the agent's observation history without relying on predefined choices. Our experimental results show that EfficientEQA achieves over 15% higher answer accuracy and requires over 20% fewer exploration steps than state-of-the-art methods. Our code is available at: https://github.com/chengkaiAcademyCity/EfficientEQA