R.I.P.
๐ป
Ghosted
LongBench: Evaluating Robotic Manipulation Policies on Real-World Long-Horizon Tasks
April 18, 2026 ยท Grace Period ยท + Add venue
Authors
Xueyao Chen, Jingkai Jia, Tong Yang, Yibo Fu, Wei Li, Wenqiang Zhang
arXiv ID
2604.16788
Category
cs.RO: Robotics
Citations
0
Abstract
Robotic manipulation policies often degrade over extended horizons, yet existing benchmarks provide limited insight into why such failures occur. Most prior benchmarks are either simulation-based or report aggregate success, making it difficult to disentangle the distinct sources of temporal difficulty in real-world execution. We introduce LongBench, a real-world benchmark for evaluating long-horizon manipulation. LongBench consists of over 1,000 real-world episodes, covering two complementary regimes: Context-Independent (fully observable) and Context-Dependent (ambiguity-driven). By organizing tasks into capability- and ambiguity-specific subsets, LongBench enables mechanism-aware evaluation of execution robustness, temporal consistency, and context-dependent reasoning. Evaluating six state-of-the-art policies reveals that long-horizon performance is not governed by a single factor. We observe that performance in fully observable settings is more strongly associated with execution robustness, while contextual difficulty varies across tasks and is not consistently improved by memory-based methods. We hope that LongBench serves as a useful benchmark for studying long-horizon manipulation and for developing policies with stronger robustness across both execution and contextual challenges.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Robotics
R.I.P.
๐ป
Ghosted
AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles
๐
๐
The Cartographer
A Survey of Motion Planning and Control Techniques for Self-driving Urban Vehicles
๐
๐
The Cartographer
Unmanned Aerial Vehicles: A Survey on Civil Applications and Key Research Challenges
๐
๐
The Cartographer
A Survey of Autonomous Driving: Common Practices and Emerging Technologies
R.I.P.
๐ป
Ghosted