Gradual DropIn of Layers to Train Very Deep Neural Networks

November 22, 2015 · Declared Dead · 🏛 Computer Vision and Pattern Recognition

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Leslie N. Smith, Emily M. Hand, Timothy Doster arXiv ID 1511.06951 Category cs.NE: Neural & Evolutionary Cross-listed cs.CV, cs.LG Citations 37 Venue Computer Vision and Pattern Recognition Last Checked 3 months ago

Abstract

We introduce the concept of dynamically growing a neural network during training. In particular, an untrainable deep network starts as a trainable shallow network and newly added layers are slowly, organically added during training, thereby increasing the network's depth. This is accomplished by a new layer, which we call DropIn. The DropIn layer starts by passing the output from a previous layer (effectively skipping over the newly added layers), then increasingly including units from the new layers for both feedforward and backpropagation. We show that deep networks, which are untrainable with conventional methods, will converge with DropIn layers interspersed in the architecture. In addition, we demonstrate that DropIn provides regularization during training in an analogous way as dropout. Experiments are described with the MNIST dataset and various expanded LeNet architectures, CIFAR-10 dataset with its architecture expanded from 3 to 11 layers, and on the ImageNet dataset with the AlexNet architecture expanded to 13 layers and the VGG 16-layer architecture.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Neural & Evolutionary

🔮 🔮 The Ethereal

LSTM: A Search Space Odyssey

Klaus Greff, Rupesh Kumar Srivastava, ... (+3 more)

cs.NE 🏛 IEEE TNNLS 📚 6.0K cites 11 years ago

R.I.P. 👻 Ghosted

Deep Learning using Rectified Linear Units (ReLU)

Abien Fred Agarap

cs.NE 🏛 arXiv 📚 3.8K cites 8 years ago

R.I.P. 👻 Ghosted

Generative Adversarial Text to Image Synthesis

Scott Reed, Zeynep Akata, ... (+4 more)

cs.NE 🏛 ICML 📚 3.4K cites 10 years ago

R.I.P. 👻 Ghosted

Regularized Evolution for Image Classifier Architecture Search

Esteban Real, Alok Aggarwal, ... (+2 more)

cs.NE 🏛 AAAI 📚 3.2K cites 8 years ago

R.I.P. 👻 Ghosted

Temporal Ensembling for Semi-Supervised Learning

Samuli Laine, Timo Aila

cs.NE 🏛 ICLR 📚 2.8K cites 9 years ago

🌅 🌅 Old Age

Learning Structured Sparsity in Deep Neural Networks

Wei Wen, Chunpeng Wu, ... (+3 more)

cs.NE 🏛 NeurIPS 📚 2.5K cites 9 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 8 years ago