Double Meta-Learning for Data Efficient Policy Optimization in Non-Stationary Environments

November 21, 2020 · Declared Dead · 🏛 IEEE International Conference on Robotics and Automation

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Elahe Aghapour, Nora Ayanian arXiv ID 2011.10714 Category cs.LG: Machine Learning Cross-listed cs.AI Citations 4 Venue IEEE International Conference on Robotics and Automation Last Checked 4 months ago

Abstract

We are interested in learning models of non-stationary environments, which can be framed as a multi-task learning problem. Model-free reinforcement learning algorithms can achieve good asymptotic performance in multi-task learning at a cost of extensive sampling, due to their approach, which requires learning from scratch. While model-based approaches are among the most data efficient learning algorithms, they still struggle with complex tasks and model uncertainties. Meta-reinforcement learning addresses the efficiency and generalization challenges on multi task learning by quickly leveraging the meta-prior policy for a new task. In this paper, we propose a meta-reinforcement learning approach to learn the dynamic model of a non-stationary environment to be used for meta-policy optimization later. Due to the sample efficiency of model-based learning methods, we are able to simultaneously train both the meta-model of the non-stationary environment and the meta-policy until dynamic model convergence. Then, the meta-learned dynamic model of the environment will generate simulated data for meta-policy optimization. Our experiment demonstrates that our proposed method can meta-learn the policy in a non-stationary environment with the data efficiency of model-based learning approaches while achieving the high asymptotic performance of model-free meta-reinforcement learning.