Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models
November 02, 2020 Β· Declared Dead Β· π IEEE International Conference on Robotics and Automation
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Yuchen Wu, Melissa Mozifian, Florian Shkurti
arXiv ID
2011.01298
Category
cs.RO: Robotics
Cross-listed
cs.LG
Citations
25
Venue
IEEE International Conference on Robotics and Automation
Last Checked
4 months ago
Abstract
The potential benefits of model-free reinforcement learning to real robotics systems are limited by its uninformed exploration that leads to slow convergence, lack of data-efficiency, and unnecessary interactions with the environment. To address these drawbacks we propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential that is trained from demonstration data, using a generative model. We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first. Unlike the majority of existing methods that assume optimal demonstrations and incorporate the demonstration data as hard constraints on policy optimization, we instead incorporate demonstration data as advice in the form of a reward shaping potential trained as a generative model of states and actions. In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials. We show that, unlike many existing approaches that incorporate demonstrations as hard constraints, our approach is unbiased even in the case of suboptimal and noisy demonstrations. We present an extensive range of simulations, as well as experiments on the Franka Emika 7DOF arm, to demonstrate the practicality of our method.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Robotics
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles
π
π
The Cartographer
A Survey of Motion Planning and Control Techniques for Self-driving Urban Vehicles
π
π
The Cartographer
Unmanned Aerial Vehicles: A Survey on Civil Applications and Key Research Challenges
π
π
The Cartographer
A Survey of Autonomous Driving: Common Practices and Emerging Technologies
R.I.P.
π»
Ghosted
Learning agile and dynamic motor skills for legged robots
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted