On the Value of Bandit Feedback for Offline Recommender System Evaluation

July 26, 2019 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Olivier Jeunen, David Rohde, Flavian Vasile arXiv ID 1907.12384 Category cs.IR: Information Retrieval Cross-listed cs.LG, stat.ML Citations 11 Venue arXiv.org Last Checked 4 months ago

Abstract

In academic literature, recommender systems are often evaluated on the task of next-item prediction. The procedure aims to give an answer to the question: "Given the natural sequence of user-item interactions up to time t, can we predict which item the user will interact with at time t+1?". Evaluation results obtained through said methodology are then used as a proxy to predict which system will perform better in an online setting. The online setting, however, poses a subtly different question: "Given the natural sequence of user-item interactions up to time t, can we get the user to interact with a recommended item at time t+1?". From a causal perspective, the system performs an intervention, and we want to measure its effect. Next-item prediction is often used as a fall-back objective when information about interventions and their effects (shown recommendations and whether they received a click) is unavailable. When this type of data is available, however, it can provide great value for reliably estimating online recommender system performance. Through a series of simulated experiments with the RecoGym environment, we show where traditional offline evaluation schemes fall short. Additionally, we show how so-called bandit feedback can be exploited for effective offline evaluation that more accurately reflects online performance.