Which Minimizer Does My Neural Network Converge To?

November 04, 2020 · Declared Dead · 🏛 ECML/PKDD

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Manuel Nonnenmacher, David Reeb, Ingo Steinwart arXiv ID 2011.02408 Category stat.ML: Machine Learning (Stat) Cross-listed cs.LG Citations 5 Venue ECML/PKDD Last Checked 4 months ago

Abstract

The loss surface of an overparameterized neural network (NN) possesses many global minima of zero training error. We explain how common variants of the standard NN training procedure change the minimizer obtained. First, we make explicit how the size of the initialization of a strongly overparameterized NN affects the minimizer and can deteriorate its final test performance. We propose a strategy to limit this effect. Then, we demonstrate that for adaptive optimization such as AdaGrad, the obtained minimizer generally differs from the gradient descent (GD) minimizer. This adaptive minimizer is changed further by stochastic mini-batch training, even though in the non-adaptive case, GD and stochastic GD result in essentially the same minimizer. Lastly, we explain that these effects remain relevant for less overparameterized NNs. While overparameterization has its benefits, our work highlights that it induces sources of error absent from underparameterized models.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Machine Learning (Stat)

🔮 🔮 The Ethereal

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, Jeff Dean

stat.ML 🏛 arXiv 📚 22.9K cites 11 years ago

🔮 🔮 The Ethereal

Layer Normalization

Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton

stat.ML 🏛 arXiv 📚 12.0K cites 9 years ago

🔮 🔮 The Ethereal

Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles

Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell

stat.ML 🏛 NeurIPS 📚 7.0K cites 9 years ago

R.I.P. 👻 Ghosted

Variational Inference with Normalizing Flows

Danilo Jimenez Rezende, Shakir Mohamed

stat.ML 🏛 ICML 📚 4.7K cites 11 years ago

📚 📚 The Cartographer

Towards A Rigorous Science of Interpretable Machine Learning

Finale Doshi-Velez, Been Kim

stat.ML 🏛 arXiv 📚 4.7K cites 9 years ago

R.I.P. 👻 Ghosted

Optimization Methods for Large-Scale Machine Learning

Léon Bottou, Frank E. Curtis, Jorge Nocedal

stat.ML 🏛 SIAM Review 📚 3.6K cites 10 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 9 years ago