December 6, 2019

282 words 2 mins read

Deep neural network model compression and an efficient inference engine

Deep neural network model compression and an efficient inference engine

Neural networks are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. Song Han explains how deep compression addresses this limitation by reducing the storage requirement of neural networks without affecting their accuracy and proposes an energy-efficient inference engine (EIE) that works with this model.

Talk Title Deep neural network model compression and an efficient inference engine
Speakers Song Han (Stanford University)
Conference O’Reilly Artificial Intelligence Conference
Conf Tag
Location New York, New York
Date September 26-27, 2016
URL Talk Page
Slides Talk Slides
Video

Neural networks are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. Song Han explains how deep compression addresses this limitation by reducing the storage requirement of neural networks without affecting their accuracy. (On the ImageNet dataset, this method reduced the storage required by AlexNet by 35x from 240 MB to 6.9 MB and VGG-16 by 49x from 552 MB to 11.3 MB, both with no loss of accuracy.) The deep compression method also facilitates the use of complex neural networks in mobile applications where application size and download bandwidth are constrained. This also allows fitting the model into an on-chip SRAM cache rather than off-chip DRAM memory. Song also proposes an energy-efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the inherent modified sparse matrix-vector multiplication. When compared to CPU and GPU implementations of the DNN without compression evaluated on nine DNN benchmarks, EIE is 189x and 13x faster respectively. An EIE with processing power of 102 GOPS at only 600 mW is also 24,000x and 3,000x more energy efficient on respective CPUs and GPUs.

comments powered by Disqus