Reward Estimation for Variance Reduction in Deep Reinforcement Learning
May 09, 2018 ยท Declared Dead ยท ๐ Conference on Robot Learning
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Joshua Romoff, Peter Henderson, Alexandre Pichรฉ, Vincent Francois-Lavet, Joelle Pineau
arXiv ID
1805.03359
Category
cs.LG: Machine Learning
Cross-listed
cs.AI,
stat.ML
Citations
43
Venue
Conference on Robot Learning
Last Checked
4 months ago
Abstract
Reinforcement Learning (RL) agents require the specification of a reward signal for learning behaviours. However, introduction of corrupt or stochastic rewards can yield high variance in learning. Such corruption may be a direct result of goal misspecification, randomness in the reward signal, or correlation of the reward with external factors that are not known to the agent. Corruption or stochasticity of the reward signal can be especially problematic in robotics, where goal specification can be particularly difficult for complex tasks. While many variance reduction techniques have been studied to improve the robustness of the RL process, handling such stochastic or corrupted reward structures remains difficult. As an alternative for handling this scenario in model-free RL methods, we suggest using an estimator for both rewards and value functions. We demonstrate that this improves performance under corrupted stochastic rewards in both the tabular and non-linear function approximation settings for a variety of noise types and environments. The use of reward estimation is a robust and easy-to-implement improvement for handling corrupted reward signals in model-free RL.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
๐ฎ
๐ฎ
The Ethereal
๐ฎ
๐ฎ
The Ethereal
Continuous control with deep reinforcement learning
๐
๐
Old Age
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
๐
๐
Old Age
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
๐
๐
Old Age
SGDR: Stochastic Gradient Descent with Warm Restarts
๐ฎ
๐ฎ
The Ethereal
Asynchronous Methods for Deep Reinforcement Learning
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
๐ป
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
๐ป
Ghosted