Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

December 20, 2023 · Declared Dead · 🏛 PARMA-DITAM

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Alireza Amirshahi, Giovanni Ansaloni, David Atienza arXiv ID 2312.13000 Category cs.AR: Hardware Architecture Cross-listed cs.AI Citations 1 Venue PARMA-DITAM Last Checked 3 months ago

Abstract

The increasing complexity of transformer models in artificial intelligence expands their computational costs, memory usage, and energy consumption. Hardware acceleration tackles the ensuing challenges by designing processors and accelerators tailored for transformer models, supporting their computation hotspots with high efficiency. However, memory bandwidth can hinder improvements in hardware accelerators. Against this backdrop, in this paper we propose a novel memory arrangement strategy, governed by the hardware accelerator's kernel size, which effectively minimizes off-chip data access. This arrangement is particularly beneficial for end-to-end transformer model inference, where most of the computation is based on general matrix multiplication (GEMM) operations. Additionally, we address the overhead of non-GEMM operations in transformer models within the scope of this memory data arrangement. Our study explores the implementation and effectiveness of the proposed accelerator-driven data arrangement approach in both single- and multi-core systems. Our evaluation demonstrates that our approach can achieve up to a 2.8x speed increase when executing inferences employing state-of-the-art transformers.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Hardware Architecture

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Corona: System Implications of Emerging Nanophotonic Technology

Dana Vantrease, Robert Schreiber, ... (+8 more)

cs.AR 🏛 ISCA 📚 710 cites 2 years ago

R.I.P. 👻 Ghosted

A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)

Saber Moradi, Ning Qiao, ... (+2 more)

cs.AR 🏛 IEEE TBCS 📚 544 cites 8 years ago

R.I.P. 👻 Ghosted

SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning

Hanrui Wang, Zhekai Zhang, Song Han

cs.AR 🏛 ISCA 📚 503 cites 5 years ago

R.I.P. 👻 Ghosted

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

Charles Eckert, Xiaowei Wang, ... (+6 more)

cs.AR 🏛 ISCA 📚 373 cites 8 years ago

R.I.P. 👻 Ghosted

SpArch: Efficient Architecture for Sparse Matrix Multiplication

Zhekai Zhang, Hanrui Wang, ... (+2 more)

cs.AR 🏛 ISCA 📚 274 cites 6 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 8 years ago

R.I.P. 👻 Ghosted

Equality of Opportunity in Supervised Learning

Moritz Hardt, Eric Price, Nathan Srebro

cs.LG 🏛 NeurIPS 📚 4.9K cites 9 years ago