$NavA^3$: Understanding Any Instruction, Navigating Anywhere, Finding Anything
August 06, 2025 · Declared Dead · 🏛 arXiv.org
"Paper promises code 'coming soon'"
Evidence collected by the PWNC Scanner
Authors
Lingfeng Zhang, Xiaoshuai Hao, Yingbo Tang, Haoxiang Fu, Xinyu Zheng, Pengwei Wang, Zhongyuan Wang, Wenbo Ding, Shanghang Zhang
arXiv ID
2508.04598
Category
cs.RO: Robotics
Citations
13
Venue
arXiv.org
Last Checked
1 month ago
Abstract
Embodied navigation is a fundamental capability of embodied intelligence, enabling robots to move and interact within physical environments. However, existing navigation tasks primarily focus on predefined object navigation or instruction following, which significantly differs from human needs in real-world scenarios involving complex, open-ended scenes. To bridge this gap, we introduce a challenging long-horizon navigation task that requires understanding high-level human instructions and performing spatial-aware object navigation in real-world environments. Existing embodied navigation methods struggle with such tasks due to their limitations in comprehending high-level human instructions and localizing objects with an open vocabulary. In this paper, we propose $NavA^3$, a hierarchical framework divided into two stages: global and local policies. In the global policy, we leverage the reasoning capabilities of Reasoning-VLM to parse high-level human instructions and integrate them with global 3D scene views. This allows us to reason and navigate to regions most likely to contain the goal object. In the local policy, we have collected a dataset of 1.0 million samples of spatial-aware object affordances to train the NaviAfford model (PointingVLM), which provides robust open-vocabulary object localization and spatial awareness for precise goal identification and navigation in complex environments. Extensive experiments demonstrate that $NavA^3$ achieves SOTA results in navigation performance and can successfully complete longhorizon navigation tasks across different robot embodiments in real-world settings, paving the way for universal embodied navigation. The dataset and code will be made available. Project website: https://NavigationA3.github.io/.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
📜 Similar Papers
In the same crypt — Robotics
🌅
🌅
Old Age
R.I.P.
👻
Ghosted
ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras
R.I.P.
👻
Ghosted
VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator
R.I.P.
👻
Ghosted
ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM
R.I.P.
👻
Ghosted
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
R.I.P.
👻
Ghosted
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age
Died the same way — ⏳ Coming Soon™
R.I.P.
⏳
Coming Soon™
Exploring Simple Siamese Representation Learning
R.I.P.
⏳
Coming Soon™
An Analysis of Scale Invariance in Object Detection - SNIP
R.I.P.
⏳
Coming Soon™
Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection
R.I.P.
⏳
Coming Soon™