Adaptive scaling of the learning rate by second order automatic differentiation

October 26, 2022 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Frédéric de Gournay, Alban Gossard arXiv ID 2210.14520 Category cs.NE: Neural & Evolutionary Citations 2 Venue arXiv.org Last Checked 4 months ago

Abstract

In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate using a new technique of automatic differentiation. This technique relies on the computation of the {\em curvature}, a second order information whose computational complexity is in between the computation of the gradient and the one of the Hessian-vector product. If (1C,1M) represents respectively the computational time and memory footprint of the gradient method, the new technique increase the overall cost to either (1.5C,2M) or (2C,1M). This rescaling has the appealing characteristic of having a natural interpretation, it allows the practitioner to choose between exploration of the parameters set and convergence of the algorithm. The rescaling is adaptive, it depends on the data and on the direction of descent. The numerical experiments highlight the different exploration/convergence regimes.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Neural & Evolutionary

🔮 🔮 The Ethereal

LSTM: A Search Space Odyssey

Klaus Greff, Rupesh Kumar Srivastava, ... (+3 more)

cs.NE 🏛 IEEE TNNLS 📚 6.0K cites 11 years ago

R.I.P. 👻 Ghosted

Deep Learning using Rectified Linear Units (ReLU)

Abien Fred Agarap

cs.NE 🏛 arXiv 📚 3.8K cites 8 years ago

R.I.P. 👻 Ghosted

Generative Adversarial Text to Image Synthesis

Scott Reed, Zeynep Akata, ... (+4 more)

cs.NE 🏛 ICML 📚 3.4K cites 10 years ago

R.I.P. 👻 Ghosted

Regularized Evolution for Image Classifier Architecture Search

Esteban Real, Alok Aggarwal, ... (+2 more)

cs.NE 🏛 AAAI 📚 3.2K cites 8 years ago

R.I.P. 👻 Ghosted

Temporal Ensembling for Semi-Supervised Learning

Samuli Laine, Timo Aila

cs.NE 🏛 ICLR 📚 2.8K cites 9 years ago

🌅 🌅 Old Age

Learning Structured Sparsity in Deep Neural Networks

Wei Wen, Chunpeng Wu, ... (+3 more)

cs.NE 🏛 NeurIPS 📚 2.5K cites 9 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 9 years ago