PENDRAM: Enabling High-Performance and Energy-Efficient Processing of Deep Neural Networks through a Generalized DRAM Data Mapping Policy
August 05, 2024 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Rachmad Vidya Wicaksana Putra, Muhammad Abdullah Hanif, Muhammad Shafique
arXiv ID
2408.02412
Category
cs.AR: Hardware Architecture
Cross-listed
cs.AI,
cs.LG,
cs.NE
Citations
0
Venue
arXiv.org
Last Checked
3 months ago
Abstract
Convolutional Neural Networks (CNNs), a prominent type of Deep Neural Networks (DNNs), have emerged as a state-of-the-art solution for solving machine learning tasks. To improve the performance and energy efficiency of CNN inference, the employment of specialized hardware accelerators is prevalent. However, CNN accelerators still face performance- and energy-efficiency challenges due to high off-chip memory (DRAM) access latency and energy, which are especially crucial for latency- and energy-constrained embedded applications. Moreover, different DRAM architectures have different profiles of access latency and energy, thus making it challenging to optimize them for high performance and energy-efficient CNN accelerators. To address this, we present PENDRAM, a novel design space exploration methodology that enables high-performance and energy-efficient CNN acceleration through a generalized DRAM data mapping policy. Specifically, it explores the impact of different DRAM data mapping policies and DRAM architectures across different CNN partitioning and scheduling schemes on the DRAM access latency and energy, then identifies the pareto-optimal design choices. The experimental results show that our DRAM data mapping policy improves the energy-delay-product of DRAM accesses in the CNN accelerator over other mapping policies by up to 96%. In this manner, our PENDRAM methodology offers high-performance and energy-efficient CNN acceleration under any given DRAM architectures for diverse embedded AI applications.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Hardware Architecture
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Corona: System Implications of Emerging Nanophotonic Technology
R.I.P.
π»
Ghosted
A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)
R.I.P.
π»
Ghosted
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
R.I.P.
π»
Ghosted
Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
R.I.P.
π»
Ghosted
SpArch: Efficient Architecture for Sparse Matrix Multiplication
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted
Explanation in Artificial Intelligence: Insights from the Social Sciences
R.I.P.
π»
Ghosted