MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback and Dynamic Distance Constraint

February 22, 2024 Β· Declared Dead Β· πŸ› IEEE Transactions on Emerging Topics in Computational Intelligence

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Xinglin Zhou, Yifu Yuan, Shaofu Yang, Jianye Hao arXiv ID 2402.14244 Category cs.AI: Artificial Intelligence Cross-listed cs.HC, cs.LG Citations 6 Venue IEEE Transactions on Emerging Topics in Computational Intelligence Last Checked 4 months ago
Abstract
Hierarchical reinforcement learning (HRL) provides a promising solution for complex tasks with sparse rewards of intelligent agents, which uses a hierarchical framework that divides tasks into subgoals and completes them sequentially. However, current methods struggle to find suitable subgoals for ensuring a stable learning process. Without additional guidance, it is impractical to rely solely on exploration or heuristics methods to determine subgoals in a large goal space. To address the issue, We propose a general hierarchical reinforcement learning framework incorporating human feedback and dynamic distance constraints (MENTOR). MENTOR acts as a "mentor", incorporating human feedback into high-level policy learning, to find better subgoals. As for low-level policy, MENTOR designs a dual policy for exploration-exploitation decoupling respectively to stabilize the training. Furthermore, although humans can simply break down tasks into subgoals to guide the right learning direction, subgoals that are too difficult or too easy can still hinder downstream learning efficiency. We propose the Dynamic Distance Constraint (DDC) mechanism dynamically adjusting the space of optional subgoals. Thus MENTOR can generate subgoals matching the low-level policy learning process from easy to hard. Extensive experiments demonstrate that MENTOR uses a small amount of human feedback to achieve significant improvement in complex tasks with sparse rewards.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Artificial Intelligence

Died the same way β€” πŸ‘» Ghosted