Distributed Optimization Based on Gradient-tracking Revisited: Enhancing Convergence Rate via Surrogation
May 07, 2019 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Ying Sun, Amir Daneshmand, Gesualdo Scutari
arXiv ID
1905.02637
Category
math.OC: Optimization & Control
Cross-listed
cs.DC
Citations
56
Venue
arXiv.org
Last Checked
2 months ago
Abstract
We study distributed multiagent optimization over (directed, time-varying) graphs. We consider the minimization of $F+G$ subject to convex constraints, where $F$ is the smooth strongly convex sum of the agent's losses and $G$ is a nonsmooth convex function. We build on the SONATA algorithm: the algorithm employs the use of surrogate objective functions in the agents' subproblems (going thus beyond linearization, such as proximal-gradient) coupled with a perturbed (push-sum) consensus mechanism that aims to track locally the gradient of $F$. SONATA achieves precision $Ξ΅>0$ on the objective value in $\mathcal{O}(ΞΊ_g \log(1/Ξ΅))$ gradient computations at each node and $\tilde{\mathcal{O}}\big(ΞΊ_g (1-Ο)^{-1/2} \log(1/Ξ΅)\big)$ communication steps, where $ΞΊ_g$ is the condition number of $F$ and $Ο$ characterizes the connectivity of the network. This is the first linear rate result for distributed composite optimization; it also improves on existing (non-accelerated) schemes just minimizing $F$, whose rate depends on much larger quantities than $ΞΊ_g$ (e.g., the worst-case condition number among the agents). When considering in particular empirical risk minimization problems with statistically similar data across the agents, SONATA employing high-order surrogates achieves precision $Ξ΅>0$ in $\mathcal{O}\big((Ξ²/ΞΌ) \log(1/Ξ΅)\big)$ iterations and $\tilde{\mathcal{O}}\big((Ξ²/ΞΌ) (1-Ο)^{-1/2} \log(1/Ξ΅)\big)$ communication steps, where $Ξ²$ measures the degree of similarity of the agents' losses and $ΞΌ$ is the strong convexity constant of $F$. Therefore, when $Ξ²/ΞΌ< ΞΊ_g$, the use of high-order surrogates yields provably faster rates than what achievable by first-order models; this is without exchanging any Hessian matrix over the network.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Optimization & Control
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Local SGD Converges Fast and Communicates Little
R.I.P.
π»
Ghosted
On Lazy Training in Differentiable Programming
R.I.P.
π»
Ghosted
A Review on Bilevel Optimization: From Classical to Evolutionary Approaches and Applications
R.I.P.
π»
Ghosted
Learned Primal-dual Reconstruction
R.I.P.
π»
Ghosted
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted