Run-Time Efficient RNN Compression for Inference on Edge Devices

June 12, 2019 · Declared Dead · 🏛 2019 2nd Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, Matthew Mattina arXiv ID 1906.04886 Category cs.LG: Machine Learning Cross-listed cs.NE, stat.ML Citations 21 Venue 2019 2nd Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2) Last Checked 4 months ago

Abstract

Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints. As a result, there is a need for compression techniques that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper explores a new compressed RNN cell implementation called Hybrid Matrix Decomposition (HMD) that achieves this dual objective. This scheme divides the weight matrix into two parts - an unconstrained upper half and a lower half composed of rank-1 blocks. This results in output features where the upper sub-vector has "richer" features while the lower-sub vector has "constrained features". HMD can compress RNNs by a factor of 2-4x while having a faster run-time than pruning (Zhu &Gupta, 2017) and retaining more model accuracy than matrix factorization (Grachev et al., 2017). We evaluate this technique on 5 benchmarks spanning 3 different applications, illustrating its generality in the domain of edge computing.