Data Transfer Optimizations for Host-CPU and Accelerators in AXI4MLIR

February 29, 2024 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Jude Haris, Nicolas Bohm Agostini, Antonino Tumeo, David Kaeli, José Cano arXiv ID 2402.19184 Category cs.PL: Programming Languages Citations 0 Venue arXiv.org Last Checked 4 months ago

Abstract

As custom hardware accelerators become more prevalent, it becomes increasingly important to automatically generate efficient host-driver code that can fully leverage the capabilities of these accelerators. This approach saves time and reduces the likelihood of errors that can occur during manual implementation. AXI4MLIR extends the MLIR compiler framework to generate host-driver code for custom accelerators for linear algebra problems. By leveraging specific compiler optimizations, we can further increase accelerator utilization. In this work we offer two key observations through a MatMul accelerator case study. First, the accelerator's compute core utilization is less than 10%, and second, the critical latency bottleneck is caused by copying data between the heap and memory-mapped DMA buffers. We identify a set of missing host code optimizations to improve the under-utilization and the latency bottleneck. Therefore, we propose three key host-code data-movement-related optimizations, extending AXI4MLIR. The optimizations provide DMA-based data allocation, coalescing of DMA transfers, and pipelining of the accelerator's load, compute, and store stages.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Programming Languages

R.I.P. 👻 Ghosted

Ascertaining Uncertainty for Efficient Exact Cache Analysis

Valentin Touzeau, Claire Maïza, ... (+2 more)

cs.PL 🏛 CAV 📚 816 cites 8 years ago

R.I.P. 👻 Ghosted

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

Nicolas Vasilache, Oleksandr Zinenko, ... (+7 more)

cs.PL 🏛 arXiv 📚 472 cites 8 years ago

R.I.P. 👻 Ghosted

Glow: Graph Lowering Compiler Techniques for Neural Networks

Nadav Rotem, Jordan Fix, ... (+16 more)

cs.PL 🏛 arXiv 📚 318 cites 8 years ago

R.I.P. 👻 Ghosted

Learnable Programming: Blocks and Beyond

David Bau, Jeff Gray, ... (+3 more)

cs.PL 🏛 CACM 📚 298 cites 9 years ago

R.I.P. 👻 Ghosted

Scenic: A Language for Scenario Specification and Scene Generation

Daniel J. Fremont, Tommaso Dreossi, ... (+4 more)

cs.PL 🏛 ACM-SIGPLAN Symposium on Programming Language Design and Implementation 📚 297 cites 7 years ago

R.I.P. 👻 Ghosted

Vandal: A Scalable Security Analysis Framework for Smart Contracts

Lexi Brent, Anton Jurisevic, ... (+6 more)

cs.PL 🏛 arXiv 📚 296 cites 7 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 9 years ago