Hierarchical Imitation Learning for Stochastic Environments
September 25, 2023 ยท Declared Dead ยท ๐ IEEE/RJS International Conference on Intelligent RObots and Systems
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Maximilian Igl, Punit Shah, Paul Mougin, Sirish Srinivasan, Tarun Gupta, Brandyn White, Kyriacos Shiarlis, Shimon Whiteson
arXiv ID
2309.14003
Category
cs.LG: Machine Learning
Cross-listed
cs.RO
Citations
3
Venue
IEEE/RJS International Conference on Intelligent RObots and Systems
Last Checked
4 months ago
Abstract
Many applications of imitation learning require the agent to generate the full distribution of behaviour observed in the training data. For example, to evaluate the safety of autonomous vehicles in simulation, accurate and diverse behaviour models of other road users are paramount. Existing methods that improve this distributional realism typically rely on hierarchical policies. These condition the policy on types such as goals or personas that give rise to multi-modal behaviour. However, such methods are often inappropriate for stochastic environments where the agent must also react to external factors: because agent types are inferred from the observed future trajectory during training, these environments require that the contributions of internal and external factors to the agent behaviour are disentangled and only internal factors, i.e., those under the agent's control, are encoded in the type. Encoding future information about external factors leads to inappropriate agent reactions during testing, when the future is unknown and types must be drawn independently from the actual future. We formalize this challenge as distribution shift in the conditional distribution of agent types under environmental stochasticity. We propose Robust Type Conditioning (RTC), which eliminates this shift with adversarial training under randomly sampled types. Experiments on two domains, including the large-scale Waymo Open Motion Dataset, show improved distributional realism while maintaining or improving task performance compared to state-of-the-art baselines.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
๐ฎ
๐ฎ
The Ethereal
๐ฎ
๐ฎ
The Ethereal
Continuous control with deep reinforcement learning
๐
๐
Old Age
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
๐
๐
Old Age
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
๐
๐
Old Age
SGDR: Stochastic Gradient Descent with Warm Restarts
๐ฎ
๐ฎ
The Ethereal
Asynchronous Methods for Deep Reinforcement Learning
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
๐ป
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
๐ป
Ghosted