RACE-IT: A Reconfigurable Analog Computing Engine for In-Memory Transformer Acceleration

November 29, 2023 · Declared Dead · 🏛 ICCD

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Lei Zhao, Aishwarya Natarajan, Luca Buonanno, Archit Gajjar, Ron M. Roth, Sergey Serebryakov, John Moon, Jim Ignowski, Giacomo Pedretti arXiv ID 2312.06532 Category cs.AR: Hardware Architecture Cross-listed cs.ET, cs.LG Citations 1 Venue ICCD Last Checked 3 months ago

Abstract

Transformer models represent the cutting edge of Deep Neural Networks (DNNs) and excel in a wide range of machine learning tasks. However, processing these models demands significant computational resources and results in a substantial memory footprint. While In-memory Computing (IMC)offers promise for accelerating Vector-Matrix Multiplications(VMMs) with high computational parallelism and minimal data movement, employing it for other crucial DNN operators remains a formidable task. This challenge is exacerbated by the extensive use of complex activation functions, Softmax, and data-dependent matrix multiplications (DMMuls) within Transformer models. To address this challenge, we introduce a Reconfigurable Analog Computing Engine (RACE) by enhancing Analog Content Addressable Memories (ACAMs) to support broader operations. Based on the RACE, we propose the RACE-IT accelerator (meaning RACE for In-memory Transformers) to enable efficient analog-domain execution of all core operations of Transformer models. Given the flexibility of our proposed RACE in supporting arbitrary computations, RACE-IT is well-suited for adapting to emerging and non-traditional DNN architectures without requiring hardware modifications. We compare RACE-IT with various accelerators. Results show that RACE-IT increases performance by 453x and 15x, and reduces energy by 354x and 122x over the state-of-the-art GPUs and existing Transformer-specific IMC accelerators, respectively.