Adversarial Dueling Bandits

October 27, 2020 ยท Declared Dead ยท ๐Ÿ› International Conference on Machine Learning

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Aadirupa Saha, Tomer Koren, Yishay Mansour arXiv ID 2010.14563 Category cs.LG: Machine Learning Cross-listed stat.ML Citations 29 Venue International Conference on Machine Learning Last Checked 4 months ago
Abstract
We introduce the problem of regret minimization in Adversarial Dueling Bandits. As in classic Dueling Bandits, the learner has to repeatedly choose a pair of items and observe only a relative binary `win-loss' feedback for this pair, but here this feedback is generated from an arbitrary preference matrix, possibly chosen adversarially. Our main result is an algorithm whose $T$-round regret compared to the \emph{Borda-winner} from a set of $K$ items is $\tilde{O}(K^{1/3}T^{2/3})$, as well as a matching $ฮฉ(K^{1/3}T^{2/3})$ lower bound. We also prove a similar high probability regret bound. We further consider a simpler \emph{fixed-gap} adversarial setup, which bridges between two extreme preference feedback models for dueling bandits: stationary preferences and an arbitrary sequence of preferences. For the fixed-gap adversarial setup we give an $\smash{ \tilde{O}((K/ฮ”^2)\log{T}) }$ regret algorithm, where $ฮ”$ is the gap in Borda scores between the best item and all other items, and show a lower bound of $ฮฉ(K/ฮ”^2)$ indicating that our dependence on the main problem parameters $K$ and $ฮ”$ is tight (up to logarithmic factors).
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning

Died the same way โ€” ๐Ÿ‘ป Ghosted