Preventing Value Function Collapse in Ensemble {Q}-Learning by Maximizing Representation Diversity

June 24, 2020 ยท Declared Dead ยท ๐Ÿ› Deep Reinforcement Learning Workshop at NeurIPS 2020

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Hassam Ullah Sheikh, Ladislau Bรถlรถni arXiv ID 2006.13823 Category cs.LG: Machine Learning Cross-listed cs.AI, stat.ML Citations 0 Venue Deep Reinforcement Learning Workshop at NeurIPS 2020 Last Checked 4 months ago
Abstract
The classic DQN algorithm is limited by the overestimation bias of the learned Q-function. Subsequent algorithms have proposed techniques to reduce this problem, without fully eliminating it. Recently, the Maxmin and Ensemble Q-learning algorithms have used different estimates provided by the ensembles of learners to reduce the overestimation bias. Unfortunately, these learners can converge to the same point in the parametric or representation space, falling back to the classic single neural network DQN. In this paper, we describe a regularization technique to maximize ensemble diversity in these algorithms. We propose and compare five regularization functions inspired from economics theory and consensus optimization. We show that the regularized approach significantly outperforms the Maxmin and Ensemble Q-learning algorithms as well as non-ensemble baselines.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning

Died the same way โ€” ๐Ÿ‘ป Ghosted