Three Methods for Training on Bandit Feedback

April 24, 2019 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Dmytro Mykhaylov, David Rohde, Flavian Vasile, Martin Bompaire, Olivier Jeunen arXiv ID 1904.10799 Category cs.IR: Information Retrieval Cross-listed stat.ML Citations 7 Venue arXiv.org Last Checked 4 months ago

Abstract

There are three quite distinct ways to train a machine learning model on recommender system logs. The first method is to model the reward prediction for each possible recommendation to the user, at the scoring time the best recommendation is found by computing an argmax over the personalized recommendations. This method obeys principles such as the conditionality principle and the likelihood principle. A second method is useful when the model does not fit reality and underfits. In this case, we can use the fact that we know the distribution of historical recommendations (concentrated on previously identified good actions with some exploration) to adjust the errors in the fit to be evenly distributed over all actions. Finally, the inverse propensity score can be used to produce an estimate of the decision rules expected performance. The latter two methods violate the conditionality and likelihood principle but are shown to have good performance in certain settings. In this paper we review the literature around this fundamental, yet often overlooked choice and do some experiments using the RecoGym simulation environment.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Information Retrieval

R.I.P. 👻 Ghosted

Graph Convolutional Neural Networks for Web-Scale Recommender Systems

Rex Ying, Ruining He, ... (+4 more)

cs.IR 🏛 KDD 📚 4.0K cites 8 years ago

🌅 🌅 Old Age

Neural Graph Collaborative Filtering

Xiang Wang, Xiangnan He, ... (+3 more)

cs.IR 🏛 SIGIR 📚 3.6K cites 7 years ago

R.I.P. 👻 Ghosted

DeepFM: A Factorization-Machine based Neural Network for CTR Prediction

Huifeng Guo, Ruiming Tang, ... (+3 more)

cs.IR 🏛 IJCAI 📚 3.0K cites 9 years ago

R.I.P. 👻 Ghosted

BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer

Fei Sun, Jun Liu, ... (+5 more)

cs.IR 🏛 CIKM 📚 2.9K cites 7 years ago

R.I.P. 💀 404 Not Found

Graph Neural Networks for Social Recommendation

Wenqi Fan, Yao Ma, ... (+5 more)

cs.IR 🏛 WWW 📚 2.2K cites 7 years ago

R.I.P. 👻 Ghosted

Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding

Jiaxi Tang, Ke Wang

cs.IR 🏛 WSDM 📚 2.0K cites 7 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 9 years ago