Adaptive Normalized Risk-Averting Training For Deep Neural Networks

June 08, 2015 · Entered Twilight · 🏛 AAAI Conference on Artificial Intelligence

"Last commit was 9.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitattributes, .gitignore, MNIST_lasagne.py, MNIST_output.png, Readme.txt

Authors Zhiguang Wang, Tim Oates, James Lo arXiv ID 1506.02690 Category cs.LG: Machine Learning Cross-listed cs.NE, stat.ML Citations 7 Venue AAAI Conference on Artificial Intelligence Repository https://github.com/cauchyturing/ANRAE ⭐ 1 Last Checked 1 month ago

Abstract

This paper proposes a set of new error criteria and learning approaches, Adaptive Normalized Risk-Averting Training (ANRAT), to attack the non-convex optimization problem in training deep neural networks (DNNs). Theoretically, we demonstrate its effectiveness on global and local convexity lower-bounded by the standard $L_p$-norm error. By analyzing the gradient on the convexity index $λ$, we explain the reason why to learn $λ$ adaptively using gradient descent works. In practice, we show how this method improves training of deep neural networks to solve visual recognition tasks on the MNIST and CIFAR-10 datasets. Without using pretraining or other tricks, we obtain results comparable or superior to those reported in recent literature on the same tasks using standard ConvNets + MSE/cross entropy. Performance on deep/shallow multilayer perceptrons and Denoised Auto-encoders is also explored. ANRAT can be combined with other quasi-Newton training methods, innovative network variants, regularization techniques and other specific tricks in DNNs. Other than unsupervised pretraining, it provides a new perspective to address the non-convex optimization problem in DNNs.