Adaptive Multiple-Arm Identification
June 04, 2017 ยท Declared Dead ยท ๐ International Conference on Machine Learning
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Jiecao Chen, Xi Chen, Qin Zhang, Yuan Zhou
arXiv ID
1706.01026
Category
cs.LG: Machine Learning
Citations
17
Venue
International Conference on Machine Learning
Last Checked
4 months ago
Abstract
We study the problem of selecting $K$ arms with the highest expected rewards in a stochastic $n$-armed bandit game. This problem has a wide range of applications, e.g., A/B testing, crowdsourcing, simulation optimization. Our goal is to develop a PAC algorithm, which, with probability at least $1-ฮด$, identifies a set of $K$ arms with the aggregate regret at most $ฮต$. The notion of aggregate regret for multiple-arm identification was first introduced in \cite{Zhou:14} , which is defined as the difference of the averaged expected rewards between the selected set of arms and the best $K$ arms. In contrast to \cite{Zhou:14} that only provides instance-independent sample complexity, we introduce a new hardness parameter for characterizing the difficulty of any given instance. We further develop two algorithms and establish the corresponding sample complexity in terms of this hardness parameter. The derived sample complexity can be significantly smaller than state-of-the-art results for a large class of instances and matches the instance-independent lower bound upto a $\log(ฮต^{-1})$ factor in the worst case. We also prove a lower bound result showing that the extra $\log(ฮต^{-1})$ is necessary for instance-dependent algorithms using the introduced hardness parameter.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
๐ฎ
๐ฎ
The Ethereal
๐ฎ
๐ฎ
The Ethereal
Continuous control with deep reinforcement learning
๐
๐
Old Age
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
๐
๐
Old Age
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
๐
๐
Old Age
SGDR: Stochastic Gradient Descent with Warm Restarts
๐ฎ
๐ฎ
The Ethereal
Asynchronous Methods for Deep Reinforcement Learning
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
๐ป
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
๐ป
Ghosted