Limitations on Variance-Reduction and Acceleration Schemes for Finite Sum Optimization
June 06, 2017 Β· Declared Dead Β· π Neural Information Processing Systems
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Yossi Arjevani
arXiv ID
1706.01686
Category
math.OC: Optimization & Control
Cross-listed
cs.LG,
stat.ML
Citations
12
Venue
Neural Information Processing Systems
Last Checked
4 months ago
Abstract
We study the conditions under which one is able to efficiently apply variance-reduction and acceleration schemes on finite sum optimization problems. First, we show that, perhaps surprisingly, the finite sum structure by itself, is not sufficient for obtaining a complexity bound of $\tilde{\cO}((n+L/ΞΌ)\ln(1/Ξ΅))$ for $L$-smooth and $ΞΌ$-strongly convex individual functions - one must also know which individual function is being referred to by the oracle at each iteration. Next, we show that for a broad class of first-order and coordinate-descent finite sum algorithms (including, e.g., SDCA, SVRG, SAG), it is not possible to get an `accelerated' complexity bound of $\tilde{\cO}((n+\sqrt{n L/ΞΌ})\ln(1/Ξ΅))$, unless the strong convexity parameter is given explicitly. Lastly, we show that when this class of algorithms is used for minimizing $L$-smooth and convex finite sums, the optimal complexity bound is $\tilde{\cO}(n+L/Ξ΅)$, assuming that (on average) the same update rule is used in every iteration, and $\tilde{\cO}(n+\sqrt{nL/Ξ΅})$, otherwise.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Optimization & Control
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Local SGD Converges Fast and Communicates Little
R.I.P.
π»
Ghosted
On Lazy Training in Differentiable Programming
π
π
The Cartographer
A Review on Bilevel Optimization: From Classical to Evolutionary Approaches and Applications
R.I.P.
π»
Ghosted
Learned Primal-dual Reconstruction
R.I.P.
π»
Ghosted
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted