A Tutorial on Thompson Sampling
July 07, 2017 Β· The Cartographer Β· π Found. Trends Mach. Learn.
"No code URL or promise found in abstract"
"Survey/review paper β maps the landscape rather than implementing a method"
Evidence collected by the PWNC Scanner
Authors
Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen
arXiv ID
1707.02038
Category
cs.LG: Machine Learning
Citations
1.1K
Venue
Found. Trends Mach. Learn.
Last Checked
23 hours ago
Abstract
Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance. The algorithm addresses a broad range of problems in a computationally efficient manner and is therefore enjoying wide use. This tutorial covers the algorithm and its application, illustrating concepts through a range of examples, including Bernoulli bandit problems, shortest path problems, product recommendation, assortment, active learning with neural networks, and reinforcement learning in Markov decision processes. Most of these problems involve complex information structures, where information revealed by taking an action informs beliefs about other actions. We will also discuss when and why Thompson sampling is or is not effective and relations to alternative algorithms.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Machine Learning
ποΈ
ποΈ
Transcended
ποΈ
ποΈ
Transcended
Continuous control with deep reinforcement learning
π
π
Old Age
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
π
π
Old Age
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
π
π
Old Age
SGDR: Stochastic Gradient Descent with Warm Restarts
ποΈ
ποΈ
Transcended