Leveraging heterogeneous spillover in maximizing contextual bandit rewards
October 16, 2023 ยท Declared Dead ยท ๐ The Web Conference
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Ahmed Sayeed Faruk, Elena Zheleva
arXiv ID
2310.10259
Category
cs.LG: Machine Learning
Cross-listed
cs.SI
Citations
2
Venue
The Web Conference
Last Checked
4 months ago
Abstract
Recommender systems relying on contextual multi-armed bandits continuously improve relevant item recommendations by taking into account the contextual information. The objective of bandit algorithms is to learn the best arm (e.g., best item to recommend) for each user and thus maximize the cumulative rewards from user engagement with the recommendations. The context that these algorithms typically consider are the user and item attributes. However, in the context of social networks where $\textit{the action of one user can influence the actions and rewards of other users,}$ neighbors' actions are also a very important context, as they can have not only predictive power but also can impact future rewards through spillover. Moreover, influence susceptibility can vary for different people based on their preferences and the closeness of ties to other users which leads to heterogeneity in the spillover effects. Here, we present a framework that allows contextual multi-armed bandits to account for such heterogeneous spillovers when choosing the best arm for each user. Our experiments on several semi-synthetic and real-world datasets show that our framework leads to significantly higher rewards than existing state-of-the-art solutions that ignore the network information and potential spillover.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
๐ฎ
๐ฎ
The Ethereal
๐ฎ
๐ฎ
The Ethereal
Continuous control with deep reinforcement learning
๐
๐
Old Age
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
๐
๐
Old Age
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
๐
๐
Old Age
SGDR: Stochastic Gradient Descent with Warm Restarts
๐ฎ
๐ฎ
The Ethereal
Asynchronous Methods for Deep Reinforcement Learning
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
๐ป
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
๐ป
Ghosted