Latent Space Policies for Hierarchical Reinforcement Learning

April 09, 2018 ยท Entered Twilight ยท ๐Ÿ› International Conference on Machine Learning

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 7.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, DIAYN.md, Dockerfile, LICENSE.txt, README.md, docker-compose.yaml, environment.yml, examples, sac, scripts, tests

Authors Tuomas Haarnoja, Kristian Hartikainen, Pieter Abbeel, Sergey Levine arXiv ID 1804.02808 Category cs.LG: Machine Learning Cross-listed cs.AI, stat.ML Citations 203 Venue International Conference on Machine Learning Repository https://github.com/haarnoja/sac โญ 1212 Last Checked 1 month ago
Abstract
We address the problem of learning hierarchical deep neural network policies for reinforcement learning. In contrast to methods that explicitly restrict or cripple lower layers of a hierarchy to force them to use higher-level modulating signals, each layer in our framework is trained to directly solve the task, but acquires a range of diverse strategies via a maximum entropy reinforcement learning objective. Each layer is also augmented with latent random variables, which are sampled from a prior distribution during the training of that layer. The maximum entropy objective causes these latent variables to be incorporated into the layer's policy, and the higher level layer can directly control the behavior of the lower layer through this latent space. Furthermore, by constraining the mapping from latent variables to actions to be invertible, higher layers retain full expressivity: neither the higher layers nor the lower layers are constrained in their behavior. Our experimental evaluation demonstrates that we can improve on the performance of single-layer policies on standard benchmark tasks simply by adding additional layers, and that our method can solve more complex sparse-reward tasks by learning higher-level policies on top of high-entropy skills optimized for simple low-level objectives.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning