Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

March 23, 2020 ยท Entered Twilight ยท ๐Ÿ› International Conference on Learning Representations

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 5.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, README.md, __init__.py, maddpg_o, mpe_local, requirements.txt, train_adversarial_epc.sh, train_adversarial_vpc.sh, train_food_collect_epc.sh, train_food_collect_vpc.sh, train_grassland_att.sh, train_grassland_epc.sh, train_grassland_vpc.sh

Authors Qian Long, Zihan Zhou, Abhibav Gupta, Fei Fang, Yi Wu, Xiaolong Wang arXiv ID 2003.10423 Category cs.LG: Machine Learning Cross-listed cs.AI, cs.NE, cs.RO, stat.ML Citations 86 Venue International Conference on Learning Representations Repository https://github.com/qian18long/epciclr2020 โญ 123 Last Checked 1 month ago
Abstract
In multi-agent games, the complexity of the environment can grow exponentially as the number of agents increases, so it is particularly challenging to learn good policies when the agent population is large. In this paper, we introduce Evolutionary Population Curriculum (EPC), a curriculum learning paradigm that scales up Multi-Agent Reinforcement Learning (MARL) by progressively increasing the population of training agents in a stage-wise manner. Furthermore, EPC uses an evolutionary approach to fix an objective misalignment issue throughout the curriculum: agents successfully trained in an early stage with a small population are not necessarily the best candidates for adapting to later stages with scaled populations. Concretely, EPC maintains multiple sets of agents in each stage, performs mix-and-match and fine-tuning over these sets and promotes the sets of agents with the best adaptability to the next stage. We implement EPC on a popular MARL algorithm, MADDPG, and empirically show that our approach consistently outperforms baselines by a large margin as the number of agents grows exponentially.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning