๐ฎ
๐ฎ
The Ethereal
Decoupled Weight Decay for Any $p$ Norm
April 16, 2024 ยท Entered Twilight ยท ๐ arXiv.org
Repo contents: .gitignore, README.md, python
Authors
Nadav Joseph Outmezguine, Noam Levi
arXiv ID
2404.10824
Category
cs.LG: Machine Learning
Cross-listed
cs.AI,
cs.NE,
math.OC
Citations
4
Venue
arXiv.org
Repository
https://github.com/Nadav-out/PAdam
Last Checked
3 months ago
Abstract
With the success of deep neural networks (NNs) in a variety of domains, the computational and storage requirements for training and deploying large NNs have become a bottleneck for further improvements. Sparsification has consequently emerged as a leading approach to tackle these issues. In this work, we consider a simple yet effective approach to sparsification, based on the Bridge, or $L_p$ regularization during training. We introduce a novel weight decay scheme, which generalizes the standard $L_2$ weight decay to any $p$ norm. We show that this scheme is compatible with adaptive optimizers, and avoids the gradient divergence associated with $0<p<1$ norms. We empirically demonstrate that it leads to highly sparse networks, while maintaining generalization performance comparable to standard $L_2$ regularization.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
๐ฎ
๐ฎ
The Ethereal
Continuous control with deep reinforcement learning
๐
๐
Old Age
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
๐
๐
Old Age
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
๐
๐
Old Age
SGDR: Stochastic Gradient Descent with Warm Restarts
๐ฎ
๐ฎ
The Ethereal