Covariance-adapting algorithm for semi-bandits with application to sparse rewards

April 15, 2026 Β· Grace Period Β· πŸ› Proceedings of the 33rd Annual Conference on Learning Theory (COLT 2020), PMLR 125, 2020

⏳ Grace Period
This paper is less than 90 days old. We give authors time to release their code before passing judgment.
Authors Pierre Perrault, Vianney Perchet, Michal Valko arXiv ID 2604.13738 Category stat.ML: Machine Learning (Stat) Cross-listed cs.LG Citations 0 Venue Proceedings of the 33rd Annual Conference on Learning Theory (COLT 2020), PMLR 125, 2020
Abstract
We investigate stochastic combinatorial semi-bandits, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits). Typical distributions considered depend on specific parameter values, whose prior knowledge is required in theory but quite difficult to estimate in practice; an example is the commonly assumed sub-Gaussian family. We alleviate this issue by instead considering a new general family of sub-exponential distributions, which contains bounded and Gaussian ones. We prove a new lower bound on the expected regret on this family, that is parameterized by the unknown covariance matrix of outcomes, a tighter quantity than the sub-Gaussian matrix. We then construct an algorithm that uses covariance estimates, and provide a tight asymptotic analysis of the regret. Finally, we apply and extend our results to the family of sparse outcomes, which has applications in many recommender systems.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Machine Learning (Stat)

R.I.P. πŸ‘» Ghosted

Graph Attention Networks

Petar VeličkoviΔ‡, Guillem Cucurull, ... (+4 more)

stat.ML πŸ› ICLR πŸ“š 24.7K cites 8 years ago
R.I.P. πŸ‘» Ghosted

Layer Normalization

Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton

stat.ML πŸ› arXiv πŸ“š 12.0K cites 9 years ago