Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics
March 11, 2019 ยท Entered Twilight ยท ๐ BNAIC/BENELEARN
"Last commit was 6.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: .gitignore, COPYING, README.md, avg_stats.py, bdpi.py, benchmark.py, copy_results.sh, experiments_gym.sh, gym_envs, main.py, paper.pdf, pool.py, poster.pdf, poster.png, results, task_speed.py
Authors
Denis Steckelmacher, Hรฉlรจne Plisnier, Diederik M. Roijers, Ann Nowรฉ
arXiv ID
1903.04193
Category
cs.LG: Machine Learning
Cross-listed
cs.AI
Citations
18
Venue
BNAIC/BENELEARN
Repository
https://github.com/vub-ai-lab/bdpi
โญ 25
Last Checked
4 months ago
Abstract
Value-based reinforcement-learning algorithms provide state-of-the-art results in model-free discrete-action settings, and tend to outperform actor-critic algorithms. We argue that actor-critic algorithms are limited by their need for an on-policy critic. We propose Bootstrapped Dual Policy Iteration (BDPI), a novel model-free reinforcement-learning algorithm for continuous states and discrete actions, with an actor and several off-policy critics. Off-policy critics are compatible with experience replay, ensuring high sample-efficiency, without the need for off-policy corrections. The actor, by slowly imitating the average greedy policy of the critics, leads to high-quality and state-specific exploration, which we compare to Thompson sampling. Because the actor and critics are fully decoupled, BDPI is remarkably stable, and unusually robust to its hyper-parameters. BDPI is significantly more sample-efficient than Bootstrapped DQN, PPO, and ACKTR, on discrete, continuous and pixel-based tasks. Source code: https://github.com/vub-ai-lab/bdpi.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
๐ฎ
๐ฎ
The Ethereal
๐ฎ
๐ฎ
The Ethereal
Continuous control with deep reinforcement learning
๐
๐
Old Age
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
๐
๐
Old Age
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
๐
๐
Old Age
SGDR: Stochastic Gradient Descent with Warm Restarts
๐ฎ
๐ฎ
The Ethereal