Reward Imputation with Sketching for Contextual Batched Bandits

October 13, 2022 ยท Declared Dead ยท ๐Ÿ› Neural Information Processing Systems

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Xiao Zhang, Ninglu Shao, Zihua Si, Jun Xu, Wenhan Wang, Hanjing Su, Ji-Rong Wen arXiv ID 2210.06719 Category cs.LG: Machine Learning Cross-listed cs.AI Citations 4 Venue Neural Information Processing Systems Last Checked 4 months ago
Abstract
Contextual batched bandit (CBB) is a setting where a batch of rewards is observed from the environment at the end of each episode, but the rewards of the non-executed actions are unobserved, resulting in partial-information feedback. Existing approaches for CBB often ignore the rewards of the non-executed actions, leading to underutilization of feedback information. In this paper, we propose an efficient approach called Sketched Policy Updating with Imputed Rewards (SPUIR) that completes the unobserved rewards using sketching, which approximates the full-information feedbacks. We formulate reward imputation as an imputation regularized ridge regression problem that captures the feedback mechanisms of both executed and non-executed actions. To reduce time complexity, we solve the regression problem using randomized sketching. We prove that our approach achieves an instantaneous regret with controllable bias and smaller variance than approaches without reward imputation. Furthermore, our approach enjoys a sublinear regret bound against the optimal policy. We also present two extensions, a rate-scheduled version and a version for nonlinear rewards, making our approach more practical. Experimental results show that SPUIR outperforms state-of-the-art baselines on synthetic, public benchmark, and real-world datasets.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning

Died the same way โ€” ๐Ÿ‘ป Ghosted