HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid
June 28, 2024 Β· Declared Dead Β· π Neural Information Processing Systems
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Xinyu Xu, Yizheng Zhang, Yong-Lu Li, Lei Han, Cewu Lu
arXiv ID
2406.19972
Category
cs.RO: Robotics
Citations
14
Venue
Neural Information Processing Systems
Last Checked
4 months ago
Abstract
Physical Human-Scene Interaction (HSI) plays a crucial role in numerous applications. However, existing HSI techniques are limited to specific object dynamics and privileged information, which prevents the development of more comprehensive applications. To address this limitation, we introduce HumanVLA for general object rearrangement directed by practical vision and language. A teacher-student framework is utilized to develop HumanVLA. A state-based teacher policy is trained first using goal-conditioned reinforcement learning and adversarial motion prior. Then, it is distilled into a vision-language-action model via behavior cloning. We propose several key insights to facilitate the large-scale learning process. To support general object rearrangement by physical humanoid, we introduce a novel Human-in-the-Room dataset encompassing various rearrangement tasks. Through extensive experiments and analysis, we demonstrate the effectiveness of the proposed approach.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Robotics
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles
π
π
The Cartographer
A Survey of Motion Planning and Control Techniques for Self-driving Urban Vehicles
π
π
The Cartographer
Unmanned Aerial Vehicles: A Survey on Civil Applications and Key Research Challenges
π
π
The Cartographer
A Survey of Autonomous Driving: Common Practices and Emerging Technologies
R.I.P.
π»
Ghosted
Learning agile and dynamic motor skills for legged robots
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted