Evolutionary Deep Reinforcement Learning Using Elite Buffer: A Novel Approach Towards DRL Combined with EA in Continuous Control Tasks

September 18, 2022 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Marzieh Sadat Esmaeeli, Hamed Malek arXiv ID 2209.08480 Category cs.NE: Neural & Evolutionary Citations 2 Venue arXiv.org Last Checked 4 months ago

Abstract

Despite the numerous applications and success of deep reinforcement learning in many control tasks, it still suffers from many crucial problems and limitations, including temporal credit assignment with sparse reward, absence of effective exploration, and a brittle convergence that is extremely sensitive to the hyperparameters of the problem. The problems of deep reinforcement learning in continuous control, along with the success of evolutionary algorithms in facing some of these problems, have emerged the idea of evolutionary reinforcement learning, which attracted many controversies. Despite successful results in a few studies in this field, a proper and fitting solution to these problems and their limitations is yet to be presented. The present study aims to study the efficiency of combining the two fields of deep reinforcement learning and evolutionary computations further and take a step towards improving methods and the existing challenges. The "Evolutionary Deep Reinforcement Learning Using Elite Buffer" algorithm introduced a novel mechanism through inspiration from interactive learning capability and hypothetical outcomes in the human brain. In this method, the utilization of the elite buffer (which is inspired by learning based on experience generalization in the human mind), along with the existence of crossover and mutation operators, and interactive learning in successive generations, have improved efficiency, convergence, and proper advancement in the field of continuous control. According to the results of experiments, the proposed method surpasses other well-known methods in environments with high complexity and dimension and is superior in resolving the mentioned problems and limitations.