Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization

December 25, 2015 · Declared Dead · 🏛 International Conference on Artificial Intelligence and Statistics

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Changyou Chen, David Carlson, Zhe Gan, Chunyuan Li, Lawrence Carin arXiv ID 1512.07962 Category stat.ML: Machine Learning (Stat) Cross-listed cs.LG Citations 93 Venue International Conference on Artificial Intelligence and Statistics Last Checked 1 month ago

Abstract

Stochastic gradient Markov chain Monte Carlo (SG-MCMC) methods are Bayesian analogs to popular stochastic optimization methods; however, this connection is not well studied. We explore this relationship by applying simulated annealing to an SGMCMC algorithm. Furthermore, we extend recent SG-MCMC methods with two key components: i) adaptive preconditioners (as in ADAgrad or RMSprop), and ii) adaptive element-wise momentum weights. The zero-temperature limit gives a novel stochastic optimization method with adaptive element-wise momentum weights, while conventional optimization methods only have a shared, static momentum weight. Under certain assumptions, our theoretical analysis suggests the proposed simulated annealing approach converges close to the global optima. Experiments on several deep neural network models show state-of-the-art results compared to related stochastic optimization algorithms.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Machine Learning (Stat)

R.I.P. 👻 Ghosted

Graph Attention Networks

Petar Veličković, Guillem Cucurull, ... (+4 more)

stat.ML 🏛 ICLR 📚 24.7K cites 8 years ago

R.I.P. 👻 Ghosted

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, Jeff Dean

stat.ML 🏛 arXiv 📚 22.9K cites 11 years ago

R.I.P. 👻 Ghosted

Layer Normalization

Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton

stat.ML 🏛 arXiv 📚 12.0K cites 9 years ago

R.I.P. 👻 Ghosted

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

Yarin Gal, Zoubin Ghahramani

stat.ML 🏛 ICML 📚 11.0K cites 10 years ago

R.I.P. 👻 Ghosted

Domain-Adversarial Training of Neural Networks

Yaroslav Ganin, Evgeniya Ustinova, ... (+6 more)

stat.ML 🏛 JMLR 📚 10.8K cites 10 years ago

R.I.P. 👻 Ghosted

Deep Learning with Differential Privacy

Martín Abadi, Andy Chu, ... (+5 more)

stat.ML 🏛 CCS 📚 7.2K cites 9 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 5 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago