Distributed Optimization Based on Gradient-tracking Revisited: Enhancing Convergence Rate via Surrogation

May 07, 2019 Β· Declared Dead Β· πŸ› arXiv.org

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Ying Sun, Amir Daneshmand, Gesualdo Scutari arXiv ID 1905.02637 Category math.OC: Optimization & Control Cross-listed cs.DC Citations 56 Venue arXiv.org Last Checked 2 months ago
Abstract
We study distributed multiagent optimization over (directed, time-varying) graphs. We consider the minimization of $F+G$ subject to convex constraints, where $F$ is the smooth strongly convex sum of the agent's losses and $G$ is a nonsmooth convex function. We build on the SONATA algorithm: the algorithm employs the use of surrogate objective functions in the agents' subproblems (going thus beyond linearization, such as proximal-gradient) coupled with a perturbed (push-sum) consensus mechanism that aims to track locally the gradient of $F$. SONATA achieves precision $Ρ>0$ on the objective value in $\mathcal{O}(κ_g \log(1/Ρ))$ gradient computations at each node and $\tilde{\mathcal{O}}\big(κ_g (1-ρ)^{-1/2} \log(1/Ρ)\big)$ communication steps, where $κ_g$ is the condition number of $F$ and $ρ$ characterizes the connectivity of the network. This is the first linear rate result for distributed composite optimization; it also improves on existing (non-accelerated) schemes just minimizing $F$, whose rate depends on much larger quantities than $κ_g$ (e.g., the worst-case condition number among the agents). When considering in particular empirical risk minimization problems with statistically similar data across the agents, SONATA employing high-order surrogates achieves precision $Ρ>0$ in $\mathcal{O}\big((β/μ) \log(1/Ρ)\big)$ iterations and $\tilde{\mathcal{O}}\big((β/μ) (1-ρ)^{-1/2} \log(1/Ρ)\big)$ communication steps, where $β$ measures the degree of similarity of the agents' losses and $μ$ is the strong convexity constant of $F$. Therefore, when $β/μ< κ_g$, the use of high-order surrogates yields provably faster rates than what achievable by first-order models; this is without exchanging any Hessian matrix over the network.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Optimization & Control

Died the same way β€” πŸ‘» Ghosted