R.I.P.
๐ป
Ghosted
VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation
July 13, 2024 ยท Declared Dead ยท ๐ Robotics: Science and Systems
Authors
Wentao Zhao, Jiaming Chen, Ziyu Meng, Donghui Mao, Ran Song, Wei Zhang
arXiv ID
2407.09829
Category
cs.RO: Robotics
Citations
31
Venue
Robotics: Science and Systems
Repository
https://github.com/PPjmchen/VLMPC}
Last Checked
3 months ago
Abstract
Although Model Predictive Control (MPC) can effectively predict the future states of a system and thus is widely used in robotic manipulation tasks, it does not have the capability of environmental perception, leading to the failure in some complex scenarios. To address this issue, we introduce Vision-Language Model Predictive Control (VLMPC), a robotic manipulation framework which takes advantage of the powerful perception capability of vision language model (VLM) and integrates it with MPC. Specifically, we propose a conditional action sampling module which takes as input a goal image or a language instruction and leverages VLM to sample a set of candidate action sequences. Then, a lightweight action-conditioned video prediction model is designed to generate a set of future frames conditioned on the candidate action sequences. VLMPC produces the optimal action sequence with the assistance of VLM through a hierarchical cost function that formulates both pixel-level and knowledge-level consistence between the current observation and the goal image. We demonstrate that VLMPC outperforms the state-of-the-art methods on public benchmarks. More importantly, our method showcases excellent performance in various real-world tasks of robotic manipulation. Code is available at~\url{https://github.com/PPjmchen/VLMPC}.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Robotics
R.I.P.
๐ป
Ghosted
AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles
๐
๐
The Cartographer
A Survey of Motion Planning and Control Techniques for Self-driving Urban Vehicles
๐
๐
The Cartographer
Unmanned Aerial Vehicles: A Survey on Civil Applications and Key Research Challenges
๐
๐
The Cartographer
A Survey of Autonomous Driving: Common Practices and Emerging Technologies
R.I.P.
๐ป
Ghosted
Learning agile and dynamic motor skills for legged robots
Died the same way โ ๐ 404 Not Found
R.I.P.
๐
404 Not Found
Deep High-Resolution Representation Learning for Visual Recognition
R.I.P.
๐
404 Not Found
HuggingFace's Transformers: State-of-the-art Natural Language Processing
R.I.P.
๐
404 Not Found
CCNet: Criss-Cross Attention for Semantic Segmentation
R.I.P.
๐
404 Not Found