Robust Asymmetric Learning in POMDPs

December 31, 2020 Β· Declared Dead Β· πŸ› International Conference on Machine Learning

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Andrew Warrington, J. Wilder Lavington, Adam Ścibior, Mark Schmidt, Frank Wood arXiv ID 2012.15566 Category cs.LG: Machine Learning Cross-listed stat.ML Citations 37 Venue International Conference on Machine Learning Last Checked 2 months ago
Abstract
Policies for partially observed Markov decision processes can be efficiently learned by imitating policies for the corresponding fully observed Markov decision processes. Unfortunately, existing approaches for this kind of imitation learning have a serious flaw: the expert does not know what the trainee cannot see, and so may encourage actions that are sub-optimal, even unsafe, under partial information. We derive an objective to instead train the expert to maximize the expected reward of the imitating agent policy, and use it to construct an efficient algorithm, adaptive asymmetric DAgger (A2D), that jointly trains the expert and the agent. We show that A2D produces an expert policy that the agent can safely imitate, in turn outperforming policies learned by imitating a fixed expert.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Machine Learning

Died the same way β€” πŸ‘» Ghosted