A Review of Off-Policy Evaluation in Reinforcement Learning

December 13, 2022 · The Cartographer · 🏛 arXiv.org

"No code URL or promise found in abstract"
"Title-pattern auto-detect: A Review of Off-Policy Evaluation in Reinforcement Learning"

Evidence collected by the PWNC Scanner

Authors Masatoshi Uehara, Chengchun Shi, Nathan Kallus arXiv ID 2212.06355 Category stat.ML: Machine Learning (Stat) Cross-listed cs.LG, math.ST, stat.ME Citations 107 Venue arXiv.org Last Checked 1 day ago

Abstract

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems. In this paper, we primarily focus on off-policy evaluation (OPE), one of the most fundamental topics in RL. In recent years, a number of OPE methods have been developed in the statistics and computer science literature. We provide a discussion on the efficiency bound of OPE, some of the existing state-of-the-art OPE methods, their statistical properties and some other related research directions that are currently actively explored.