A Survey of Methods for Collective Communication Optimization and Tuning
November 19, 2016 ยท The Cartographer ยท ๐ arXiv.org
"No code URL or promise found in abstract"
"Title-pattern auto-detect: A Survey of Methods for Collective Communication Optimization and Tuning"
Evidence collected by the PWNC Scanner
Authors
Udayanga Wickramasinghe, Andrew Lumsdaine
arXiv ID
1611.06334
Category
cs.DC: Distributed Computing
Citations
22
Venue
arXiv.org
Last Checked
2 days ago
Abstract
New developments in HPC technology in terms of increasing computing power on multi/many core processors, high-bandwidth memory/IO subsystems and communication interconnects, pose a direct impact on software and runtime system development. These advancements have become useful in producing high-performance collective communication interfaces that integrate efficiently on a wide variety of platforms and environments. However, number of optimization options that shows up with each new technology or software framework has resulted in a \emph{combinatorial explosion} in feature space for tuning collective parameters such that finding the optimal set has become a nearly impossible task. Applicability of algorithmic choices available for optimizing collective communication depends largely on the scalability requirement for a particular usecase. This problem can be further exasperated by any requirement to run collective problems at very large scales such as in the case of exascale computing, at which impractical tuning by brute force may require many months of resources. Therefore application of statistical, data mining and artificial Intelligence or more general hybrid learning models seems essential in many collectives parameter optimization problems. We hope to explore current and the cutting edge of collective communication optimization and tuning methods and culminate with possible future directions towards this problem.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Distributed Computing
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
Reproducing GW150914: the first observation of gravitational waves from a binary black hole merger
R.I.P.
๐ป
Ghosted
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
R.I.P.
๐ป
Ghosted
Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems
R.I.P.
๐ป
Ghosted
Adaptive Federated Learning in Resource Constrained Edge Computing Systems
R.I.P.
๐ป
Ghosted