Achieving Near-Optimal Convergence for Distributed Minimax Optimization with Adaptive Stepsizes
June 05, 2024 Β· Declared Dead Β· π Neural Information Processing Systems
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Yan Huang, Xiang Li, Yipeng Shen, Niao He, Jinming Xu
arXiv ID
2406.02939
Category
math.OC: Optimization & Control
Cross-listed
cs.DC,
cs.LG
Citations
2
Venue
Neural Information Processing Systems
Last Checked
4 months ago
Abstract
In this paper, we show that applying adaptive methods directly to distributed minimax problems can result in non-convergence due to inconsistency in locally computed adaptive stepsizes. To address this challenge, we propose D-AdaST, a Distributed Adaptive minimax method with Stepsize Tracking. The key strategy is to employ an adaptive stepsize tracking protocol involving the transmission of two extra (scalar) variables. This protocol ensures the consistency among stepsizes of nodes, eliminating the steady-state error due to the lack of coordination of stepsizes among nodes that commonly exists in vanilla distributed adaptive methods, and thus guarantees exact convergence. For nonconvex-strongly-concave distributed minimax problems, we characterize the specific transient times that ensure time-scale separation of stepsizes and quasi-independence of networks, leading to a near-optimal convergence rate of $\tilde{\mathcal{O}} \left( Ξ΅^{-\left( 4+Ξ΄\right)} \right)$ for any small $Ξ΄> 0$, matching that of the centralized counterpart. To our best knowledge, D-AdaST is the first distributed adaptive method achieving near-optimal convergence without knowing any problem-dependent parameters for nonconvex minimax problems. Extensive experiments are conducted to validate our theoretical results.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Optimization & Control
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Local SGD Converges Fast and Communicates Little
R.I.P.
π»
Ghosted
On Lazy Training in Differentiable Programming
π
π
The Cartographer
A Review on Bilevel Optimization: From Classical to Evolutionary Approaches and Applications
R.I.P.
π»
Ghosted
Learned Primal-dual Reconstruction
R.I.P.
π»
Ghosted
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted