Multi-Armed Bandits with Metric Movement Costs

October 24, 2017 ยท Declared Dead ยท ๐Ÿ› Neural Information Processing Systems

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Tomer Koren, Roi Livni, Yishay Mansour arXiv ID 1710.08997 Category cs.LG: Machine Learning Citations 21 Venue Neural Information Processing Systems Last Checked 3 months ago
Abstract
We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution gives a tight characterization of the expected minimax regret in this setting, in terms of a complexity measure $\mathcal{C}$ of the underlying metric which depends on its covering numbers. In finite metric spaces with $k$ actions, we give an efficient algorithm that achieves regret of the form $\widetilde{O}(\max\{\mathcal{C}^{1/3}T^{2/3},\sqrt{kT}\})$, and show that this is the best possible. Our regret bound generalizes previous known regret bounds for some special cases: (i) the unit-switching cost regret $\widetildeฮ˜(\max\{k^{1/3}T^{2/3},\sqrt{kT}\})$ where $\mathcal{C}=ฮ˜(k)$, and (ii) the interval metric with regret $\widetildeฮ˜(\max\{T^{2/3},\sqrt{kT}\})$ where $\mathcal{C}=ฮ˜(1)$. For infinite metrics spaces with Lipschitz loss functions, we derive a tight regret bound of $\widetildeฮ˜(T^{\frac{d+1}{d+2}})$ where $d \ge 1$ is the Minkowski dimension of the space, which is known to be tight even when there are no switching costs.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning

Died the same way โ€” ๐Ÿ‘ป Ghosted