QUENN: QUantization Engine for low-power Neural Networks

November 14, 2018 · Declared Dead · 🏛 ACM International Conference on Computing Frontiers

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Miguel de Prado, Maurizio Denna, Luca Benini, Nuria Pazos arXiv ID 1811.05896 Category cs.NE: Neural & Evolutionary Cross-listed cs.LG Citations 15 Venue ACM International Conference on Computing Frontiers Last Checked 4 months ago

Abstract

Deep Learning is moving to edge devices, ushering in a new age of distributed Artificial Intelligence (AI). The high demand of computational resources required by deep neural networks may be alleviated by approximate computing techniques, and most notably reduced-precision arithmetic with coarsely quantized numerical representations. In this context, Bonseyes comes in as an initiative to enable stakeholders to bring AI to low-power and autonomous environments such as: Automotive, Medical Healthcare and Consumer Electronics. To achieve this, we introduce LPDNN, a framework for optimized deployment of Deep Neural Networks on heterogeneous embedded devices. In this work, we detail the quantization engine that is integrated in LPDNN. The engine depends on a fine-grained workflow which enables a Neural Network Design Exploration and a sensitivity analysis of each layer for quantization. We demonstrate the engine with a case study on Alexnet and VGG16 for three different techniques for direct quantization: standard fixed-point, dynamic fixed-point and k-means clustering, and demonstrate the potential of the latter. We argue that using a Gaussian quantizer with k-means clustering can achieve better performance than linear quantizers. Without retraining, we achieve over 55.64\% saving for weights' storage and 69.17\% for run-time memory accesses with less than 1\% drop in top5 accuracy in Imagenet.