๐ฎ
๐ฎ
The Ethereal
Communication-Efficient Gluon in Federated Learning
April 12, 2026 ยท Grace Period ยท + Add venue
Authors
Xun Qian, Alexander Gaponov, Grigory Malinovsky, Peter Richtรกrik
arXiv ID
2604.10689
Category
cs.LG: Machine Learning
Citations
0
Abstract
Recent developments have shown that Muon-type optimizers based on linear minimization oracles (LMOs) over non-Euclidean norm balls have the potential to get superior practical performance than Adam-type methods in the training of large language models. Since large-scale neural networks are trained across massive machines, communication cost becomes the bottleneck. To address this bottleneck, we investigate Gluon, which is an extension of Muon under the more general layer-wise $(L^0, L^1)$-smooth setting, with both unbiased and contraction compressors. In order to reduce the compression error, we employ the variance reduced technique in SARAH in our compressed methods. The convergence rates and improved communication cost are achieved under certain conditions. As a byproduct, a new variance reduced algorithm with faster convergence rate than Gluon is obtained. We also incorporate momentum variance reduction (MVR) to these compressed algorithms and comparable communication cost is derived under weaker conditions when $L_i^1 \neq 0$. Finally, several numerical experiments are conducted to verify the superior performance of our compressed algorithms in terms of communication cost.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
๐ฎ
๐ฎ
The Ethereal
Continuous control with deep reinforcement learning
๐
๐
Old Age
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
๐
๐
Old Age
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
๐
๐
Old Age
SGDR: Stochastic Gradient Descent with Warm Restarts
๐ฎ
๐ฎ
The Ethereal