Sharpness-Aware Minimization and the Edge of Stability

September 21, 2023 ยท Declared Dead ยท ๐Ÿ› Journal of machine learning research

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Philip M. Long, Peter L. Bartlett arXiv ID 2309.12488 Category cs.LG: Machine Learning Cross-listed cs.NE, stat.ML Citations 15 Venue Journal of machine learning research Last Checked 4 months ago
Abstract
Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $ฮท$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/ฮท$, after which it fluctuates around this value. The quantity $2/ฮท$ has been called the "edge of stability" based on consideration of a local quadratic approximation of the loss. We perform a similar calculation to arrive at an "edge of stability" for Sharpness-Aware Minimization (SAM), a variant of GD which has been shown to improve its generalization. Unlike the case for GD, the resulting SAM-edge depends on the norm of the gradient. Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning

Died the same way โ€” ๐Ÿ‘ป Ghosted