ClevrSkills: Compositional Language and Visual Reasoning in Robotics
November 13, 2024 Β· Declared Dead Β· π Neural Information Processing Systems
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Sanjay Haresh, Daniel Dijkman, Apratim Bhattacharyya, Roland Memisevic
arXiv ID
2411.09052
Category
cs.RO: Robotics
Cross-listed
cs.LG
Citations
7
Venue
Neural Information Processing Systems
Last Checked
4 months ago
Abstract
Robotics tasks are highly compositional by nature. For example, to perform a high-level task like cleaning the table a robot must employ low-level capabilities of moving the effectors to the objects on the table, pick them up and then move them off the table one-by-one, while re-evaluating the consequently dynamic scenario in the process. Given that large vision language models (VLMs) have shown progress on many tasks that require high level, human-like reasoning, we ask the question: if the models are taught the requisite low-level capabilities, can they compose them in novel ways to achieve interesting high-level tasks like cleaning the table without having to be explicitly taught so? To this end, we present ClevrSkills - a benchmark suite for compositional reasoning in robotics. ClevrSkills is an environment suite developed on top of the ManiSkill2 simulator and an accompanying dataset. The dataset contains trajectories generated on a range of robotics tasks with language and visual annotations as well as multi-modal prompts as task specification. The suite includes a curriculum of tasks with three levels of compositional understanding, starting with simple tasks requiring basic motor skills. We benchmark multiple different VLM baselines on ClevrSkills and show that even after being pre-trained on large numbers of tasks, these models fail on compositional reasoning in robotics tasks.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Robotics
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles
π
π
The Cartographer
A Survey of Motion Planning and Control Techniques for Self-driving Urban Vehicles
π
π
The Cartographer
Unmanned Aerial Vehicles: A Survey on Civil Applications and Key Research Challenges
π
π
The Cartographer
A Survey of Autonomous Driving: Common Practices and Emerging Technologies
R.I.P.
π»
Ghosted
Learning agile and dynamic motor skills for legged robots
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted