On Information Gain and Regret Bounds in Gaussian Process Bandits

September 15, 2020 · Declared Dead · 🏛 International Conference on Artificial Intelligence and Statistics

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Sattar Vakili, Kia Khezeli, Victor Picheny arXiv ID 2009.06966 Category stat.ML: Machine Learning (Stat) Cross-listed cs.IT, cs.LG Citations 160 Venue International Conference on Artificial Intelligence and Statistics Last Checked 2 months ago

Abstract

Consider the sequential optimization of an expensive to evaluate and possibly non-convex objective function $f$ from noisy feedback, that can be considered as a continuum-armed bandit problem. Upper bounds on the regret performance of several learning algorithms (GP-UCB, GP-TS, and their variants) are known under both a Bayesian (when $f$ is a sample from a Gaussian process (GP)) and a frequentist (when $f$ lives in a reproducing kernel Hilbert space) setting. The regret bounds often rely on the maximal information gain $γ_T$ between $T$ observations and the underlying GP (surrogate) model. We provide general bounds on $γ_T$ based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels, improves the existing bounds on $γ_T$, and subsequently the regret bounds relying on $γ_T$ under numerous settings. For the Matérn family of kernels, where the lower bounds on $γ_T$, and regret under the frequentist setting, are known, our results close a huge polynomial in $T$ gap between the upper and lower bounds (up to logarithmic in $T$ factors).