Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles

October 23, 2019 · Declared Dead · 🏛 International Conference on Artificial Intelligence and Statistics

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Aditya Modi, Nan Jiang, Ambuj Tewari, Satinder Singh arXiv ID 1910.10597 Category cs.LG: Machine Learning Cross-listed cs.AI, stat.ML Citations 140 Venue International Conference on Artificial Intelligence and Statistics Last Checked 2 months ago

Abstract

Reinforcement learning (RL) methods have been shown to be capable of learning intelligent behavior in rich domains. However, this has largely been done in simulated domains without adequate focus on the process of building the simulator. In this paper, we consider a setting where we have access to an ensemble of pre-trained and possibly inaccurate simulators (models). We approximate the real environment using a state-dependent linear combination of the ensemble, where the coefficients are determined by the given state features and some unknown parameters. Our proposed algorithm provably learns a near-optimal policy with a sample complexity polynomial in the number of unknown parameters, and incurs no dependence on the size of the state (or action) space. As an extension, we also consider the more challenging problem of model selection, where the state features are unknown and can be chosen from a large candidate set. We provide exponential lower bounds that illustrate the fundamental hardness of this problem, and develop a provably efficient algorithm under additional natural assumptions.