A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

April 03, 2015 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton arXiv ID 1504.00941 Category cs.NE: Neural & Evolutionary Cross-listed cs.LG Citations 748 Venue arXiv.org Last Checked 2 months ago

Abstract

Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Neural & Evolutionary

R.I.P. 👻 Ghosted

A Style-Based Generator Architecture for Generative Adversarial Networks

Tero Karras, Samuli Laine, Timo Aila

cs.NE 🏛 CVPR 📚 12.3K cites 7 years ago

R.I.P. 👻 Ghosted

Progressive Growing of GANs for Improved Quality, Stability, and Variation

Tero Karras, Timo Aila, ... (+2 more)

cs.NE 🏛 ICLR 📚 8.2K cites 8 years ago

R.I.P. 👻 Ghosted

Learning both Weights and Connections for Efficient Neural Networks

Song Han, Jeff Pool, ... (+2 more)

cs.NE 🏛 NeurIPS 📚 7.4K cites 10 years ago

R.I.P. 👻 Ghosted

LSTM: A Search Space Odyssey

Klaus Greff, Rupesh Kumar Srivastava, ... (+3 more)

cs.NE 🏛 IEEE TNNLS 📚 6.0K cites 11 years ago

R.I.P. 👻 Ghosted

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

Dan Hendrycks, Kevin Gimpel

cs.NE 🏛 ICLR 📚 4.0K cites 9 years ago

R.I.P. 👻 Ghosted

An Introduction to Convolutional Neural Networks

Keiron O'Shea, Ryan Nash

cs.NE 🏛 arXiv 📚 3.8K cites 10 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 5 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago