Generalization of ERM in Stochastic Convex Optimization: The Dimension Strikes Back
August 15, 2016 ยท Declared Dead ยท ๐ Neural Information Processing Systems
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Vitaly Feldman
arXiv ID
1608.04414
Category
cs.LG: Machine Learning
Cross-listed
stat.ML
Citations
73
Venue
Neural Information Processing Systems
Last Checked
3 months ago
Abstract
In stochastic convex optimization the goal is to minimize a convex function $F(x) \doteq {\mathbf E}_{{\mathbf f}\sim D}[{\mathbf f}(x)]$ over a convex set $\cal K \subset {\mathbb R}^d$ where $D$ is some unknown distribution and each $f(\cdot)$ in the support of $D$ is convex over $\cal K$. The optimization is commonly based on i.i.d.~samples $f^1,f^2,\ldots,f^n$ from $D$. A standard approach to such problems is empirical risk minimization (ERM) that optimizes $F_S(x) \doteq \frac{1}{n}\sum_{i\leq n} f^i(x)$. Here we consider the question of how many samples are necessary for ERM to succeed and the closely related question of uniform convergence of $F_S$ to $F$ over $\cal K$. We demonstrate that in the standard $\ell_p/\ell_q$ setting of Lipschitz-bounded functions over a $\cal K$ of bounded radius, ERM requires sample size that scales linearly with the dimension $d$. This nearly matches standard upper bounds and improves on $ฮฉ(\log d)$ dependence proved for $\ell_2/\ell_2$ setting by Shalev-Shwartz et al. (2009). In stark contrast, these problems can be solved using dimension-independent number of samples for $\ell_2/\ell_2$ setting and $\log d$ dependence for $\ell_1/\ell_\infty$ setting using other approaches. We further show that our lower bound applies even if the functions in the support of $D$ are smooth and efficiently computable and even if an $\ell_1$ regularization term is added. Finally, we demonstrate that for a more general class of bounded-range (but not Lipschitz-bounded) stochastic convex programs an infinite gap appears already in dimension 2.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
๐ฎ
๐ฎ
The Ethereal
๐ฎ
๐ฎ
The Ethereal
Continuous control with deep reinforcement learning
๐
๐
Old Age
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
๐
๐
Old Age
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
๐
๐
Old Age
SGDR: Stochastic Gradient Descent with Warm Restarts
๐ฎ
๐ฎ
The Ethereal
Asynchronous Methods for Deep Reinforcement Learning
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
๐ป
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
๐ป
Ghosted