Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient
November 08, 2020 ยท Declared Dead ยท ๐ International Conference on Machine Learning
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvรกri, Mengdi Wang
arXiv ID
2011.04019
Category
cs.LG: Machine Learning
Cross-listed
math.ST,
stat.ML
Citations
29
Venue
International Conference on Machine Learning
Last Checked
4 months ago
Abstract
This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation. When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient. We first consider the off-policy policy evaluation problem. To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension. To reduce the Lasso bias, we further propose a post model-selection estimator that applies fitted Q-evaluation to the features selected via group Lasso. Under an additional signal strength assumption, we derive a sharper instance-dependent error bound that depends on a divergence function measuring the distribution mismatch between the data distribution and occupancy measure of the target policy. Further, we study the Lasso fitted Q-iteration for batch policy optimization and establish a finite-sample error bound depending on the ratio between the number of relevant features and restricted minimal eigenvalue of the data's covariance. In the end, we complement the results with minimax lower bounds for batch-data policy evaluation/optimization that nearly match our upper bounds. The results suggest that having well-conditioned data is crucial for sparse batch policy learning.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
๐ฎ
๐ฎ
The Ethereal
๐ฎ
๐ฎ
The Ethereal
Continuous control with deep reinforcement learning
๐
๐
Old Age
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
๐
๐
Old Age
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
๐
๐
Old Age
SGDR: Stochastic Gradient Descent with Warm Restarts
๐ฎ
๐ฎ
The Ethereal
Asynchronous Methods for Deep Reinforcement Learning
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
๐ป
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
๐ป
Ghosted