A Continuous-Time View of Early Stopping for Least Squares
October 23, 2018 Β· Declared Dead Β· π International Conference on Artificial Intelligence and Statistics
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Alnur Ali, J. Zico Kolter, Ryan J. Tibshirani
arXiv ID
1810.10082
Category
stat.ML: Machine Learning (Stat)
Cross-listed
cs.LG
Citations
105
Venue
International Conference on Artificial Intelligence and Statistics
Last Checked
1 month ago
Abstract
We study the statistical properties of the iterates generated by gradient descent, applied to the fundamental problem of least squares regression. We take a continuous-time view, i.e., consider infinitesimal step sizes in gradient descent, in which case the iterates form a trajectory called gradient flow. Our primary focus is to compare the risk of gradient flow to that of ridge regression. Under the calibration $t=1/Ξ»$---where $t$ is the time parameter in gradient flow, and $Ξ»$ the tuning parameter in ridge regression---we prove that the risk of gradient flow is no less than 1.69 times that of ridge, along the entire path (for all $t \geq 0$). This holds in finite samples with very weak assumptions on the data model (in particular, with no assumptions on the features $X$). We prove that the same relative risk bound holds for prediction risk, in an average sense over the underlying signal $Ξ²_0$. Finally, we examine limiting risk expressions (under standard Marchenko-Pastur asymptotics), and give supporting numerical experiments.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Machine Learning (Stat)
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Distilling the Knowledge in a Neural Network
R.I.P.
π»
Ghosted
Layer Normalization
R.I.P.
π»
Ghosted
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
R.I.P.
π»
Ghosted
Domain-Adversarial Training of Neural Networks
R.I.P.
π»
Ghosted
Deep Learning with Differential Privacy
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted