Morello: Compiling Fast Neural Networks with Dynamic Programming and Spatial Compression
May 03, 2025 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Samuel J. Kaufman, RenΓ© Just, Rastislav Bodik
arXiv ID
2505.01637
Category
cs.PL: Programming Languages
Cross-listed
cs.LG
Citations
0
Venue
arXiv.org
Last Checked
4 months ago
Abstract
High-throughput neural network inference requires coordinating many optimization decisions, including parallel tiling, microkernel selection, and data layout. The product of these decisions forms a search space of programs which is typically intractably large. Existing approaches (e.g., auto-schedulers) often address this problem by sampling this space heuristically. In contrast, we introduce a dynamic-programming-based approach to explore more of the search space by iteratively decomposing large program specifications into smaller specifications reachable from a set of rewrites, then composing a final program from each rewrite that minimizes an affine cost model. To reduce memory requirements, we employ a novel memoization table representation, which indexes specifications by coordinates in $Z_{\geq 0}$ and compresses identical, adjacent solutions. This approach can visit a much larger set of programs than prior work. To evaluate the approach, we developed Morello, a compiler which lowers specifications roughly equivalent to a few-node XLA computation graph to x86. Notably, we found that an affine cost model is sufficient to surface high-throughput programs. For example, Morello synthesized a collection of matrix multiplication benchmarks targeting a Zen 1 CPU, including a 1x2048x16384, bfloat16-to-float32 vector-matrix multiply, which was integrated into Google's gemma.cpp.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Programming Languages
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
R.I.P.
π»
Ghosted
Glow: Graph Lowering Compiler Techniques for Neural Networks
R.I.P.
π»
Ghosted
Learnable Programming: Blocks and Beyond
R.I.P.
π»
Ghosted
Scenic: A Language for Scenario Specification and Scene Generation
R.I.P.
π»
Ghosted
Vandal: A Scalable Security Analysis Framework for Smart Contracts
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted