Simplifying Dependent Reductions in the Polyhedral Model
July 22, 2020 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Cambridge Yang, Eric Atkinson, Michael Carbin
arXiv ID
2007.11203
Category
cs.PL: Programming Languages
Citations
3
Venue
arXiv.org
Last Checked
4 months ago
Abstract
A Reduction -- an accumulation over a set of values, using an associative and commutative operator -- is a common computation in many numerical computations, including scientific computations, machine learning, computer vision, and financial analytics. Contemporary polyhedral-based compilation techniques make it possible to optimize reductions, such as prefix sums, in which each component of the reduction's output potentially shares computation with another component in the reduction. Therefore an optimizing compiler can identify the computation shared between multiple components and generate code that computes the shared computation only once. These techniques, however, do not support reductions that -- when phrased in the language of the polyhedral model -- span multiple dependent statements. In such cases, existing approaches can generate incorrect code that violates the data dependences of the original, unoptimized program. In this work, we identify and formalize the optimization of dependent reductions as an integer bilinear program. We present a heuristic optimization algorithm that uses an affine sequential schedule of the program to determine how to simplfy reductions yet still preserve the program's dependences. We demonstrate that the algorithm provides optimal complexity for a set of benchmark programs from the literature on probabilistic inference algorithms, whose performance critically relies on simplifying these reductions. The complexities for 10 of the 11 programs improve siginifcantly by factors at least of the sizes of the input data, which are in the range of $10^4$ to $10^6$ for typical real application inputs. We also confirm the significance of the improvement by showing speedups in wall-clock time that range from $1.1\text{x}$ to over $10^6\text{x}$.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Programming Languages
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
R.I.P.
π»
Ghosted
Glow: Graph Lowering Compiler Techniques for Neural Networks
R.I.P.
π»
Ghosted
Learnable Programming: Blocks and Beyond
R.I.P.
π»
Ghosted
Scenic: A Language for Scenario Specification and Scene Generation
R.I.P.
π»
Ghosted
Vandal: A Scalable Security Analysis Framework for Smart Contracts
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted