A Regularized Opponent Model with Maximum Entropy Objective
May 17, 2019 Β· Declared Dead Β· π International Joint Conference on Artificial Intelligence
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Zheng Tian, Ying Wen, Zhichen Gong, Faiz Punakkath, Shihao Zou, Jun Wang
arXiv ID
1905.08087
Category
cs.MA: Multiagent Systems
Cross-listed
cs.AI,
cs.LG
Citations
35
Venue
International Joint Conference on Artificial Intelligence
Last Checked
2 months ago
Abstract
In a single-agent setting, reinforcement learning (RL) tasks can be cast into an inference problem by introducing a binary random variable o, which stands for the "optimality". In this paper, we redefine the binary random variable o in multi-agent setting and formalize multi-agent reinforcement learning (MARL) as probabilistic inference. We derive a variational lower bound of the likelihood of achieving the optimality and name it as Regularized Opponent Model with Maximum Entropy Objective (ROMMEO). From ROMMEO, we present a novel perspective on opponent modeling and show how it can improve the performance of training agents theoretically and empirically in cooperative games. To optimize ROMMEO, we first introduce a tabular Q-iteration method ROMMEO-Q with proof of convergence. We extend the exact algorithm to complex environments by proposing an approximate version, ROMMEO-AC. We evaluate these two algorithms on the challenging iterated matrix game and differential game respectively and show that they can outperform strong MARL baselines.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Multiagent Systems
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Mean Field Multi-Agent Reinforcement Learning
R.I.P.
π»
Ghosted
A Survey and Critique of Multiagent Deep Reinforcement Learning
R.I.P.
π»
Ghosted
A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity
R.I.P.
π»
Ghosted
Collaborative vehicle routing: a survey
R.I.P.
π»
Ghosted
Deep Reinforcement Learning for Swarm Systems
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted