Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem
May 22, 2023 Β· Declared Dead Β· π Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Behnaz Arzani, Siva Kesava Reddy Kakarla, Miguel Castro, Srikanth Kandula, Saeed Maleki, Luke Marshall
arXiv ID
2305.13479
Category
cs.NI: Networking & Internet
Citations
49
Venue
Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Last Checked
3 months ago
Abstract
We show communication schedulers' recent work proposed for ML collectives does not scale to the increasing problem sizes that arise from training larger models. These works also often produce suboptimal schedules. We make a connection with similar problems in traffic engineering and propose a new method, TECCL, that finds better quality schedules (e.g., finishes collectives faster and/or while sending fewer bytes) and does so more quickly on larger topologies. We present results on many different GPU topologies that show substantial improvement over the state-of-the-art.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Networking & Internet
R.I.P.
π»
Ghosted
π
π
The Cartographer
Federated Learning in Mobile Edge Networks: A Comprehensive Survey
π
π
The Cartographer
A Survey of Indoor Localization Systems and Technologies
R.I.P.
π»
Ghosted
Survey of Important Issues in UAV Communication Networks
π
π
The Cartographer
Network Function Virtualization: State-of-the-art and Research Challenges
π
π
The Cartographer
Applications of Deep Reinforcement Learning in Communications and Networking: A Survey
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted