R.I.P.
π»
Ghosted
Doubly Robust Policy Evaluation and Optimization
March 10, 2015 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Miroslav DudΓk, Dumitru Erhan, John Langford, Lihong Li
arXiv ID
1503.02834
Category
stat.ME
Cross-listed
cs.AI
Citations
308
Venue
arXiv.org
Last Checked
1 month ago
Abstract
We study sequential decision making in environments where rewards are only partially observed, but can be modeled as a function of observed contexts and the chosen action by the decision maker. This setting, known as contextual bandits, encompasses a wide variety of applications such as health care, content recommendation and Internet advertising. A central task is evaluation of a new policy given historic data consisting of contexts, actions and received rewards. The key challenge is that the past data typically does not faithfully represent proportions of actions taken by a new policy. Previous approaches rely either on models of rewards or models of the past policy. The former are plagued by a large bias whereas the latter have a large variance. In this work, we leverage the strengths and overcome the weaknesses of the two approaches by applying the doubly robust estimation technique to the problems of policy evaluation and optimization. We prove that this approach yields accurate value estimates when we have either a good (but not necessarily consistent) model of rewards or a good (but not necessarily consistent) model of past policy. Extensive empirical comparison demonstrates that the doubly robust estimation uniformly improves over existing techniques, achieving both lower variance in value estimation and better policies. As such, we expect the doubly robust approach to become common practice in policy evaluation and optimization.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β stat.ME
R.I.P.
π»
Ghosted
Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology
R.I.P.
π»
Ghosted
External Validity: From Do-Calculus to Transportability Across Populations
R.I.P.
π»
Ghosted
Least Ambiguous Set-Valued Classifiers with Bounded Error Levels
R.I.P.
π»
Ghosted
Comparison of Bayesian predictive methods for model selection
R.I.P.
π»
Ghosted
PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted