December 6, 2019

282 words 2 mins read

Deep neural network model compression and an efficient inference engine

Deep neural network model compression and an efficient inference engine

Neural networks are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. Song Han explains how deep compression addresses this limitation by reducing the storage requirement of neural networks without affecting their accuracy and proposes an energy-efficient inference engine (EIE) that works with this model.


Talk Title	Deep neural network model compression and an efficient inference engine
Speakers	Song Han (Stanford University)
Conference	O’Reilly Artificial Intelligence Conference
Conf Tag
Location	New York, New York
Date	September 26-27, 2016
URL	Talk Page
Slides	Talk Slides
Video

Neural networks are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. Song Han explains how deep compression addresses this limitation by reducing the storage requirement of neural networks without affecting their accuracy. (On the ImageNet dataset, this method reduced the storage required by AlexNet by 35x from 240 MB to 6.9 MB and VGG-16 by 49x from 552 MB to 11.3 MB, both with no loss of accuracy.) The deep compression method also facilitates the use of complex neural networks in mobile applications where application size and download bandwidth are constrained. This also allows fitting the model into an on-chip SRAM cache rather than off-chip DRAM memory. Song also proposes an energy-efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the inherent modified sparse matrix-vector multiplication. When compared to CPU and GPU implementations of the DNN without compression evaluated on nine DNN benchmarks, EIE is 189x and 13x faster respectively. An EIE with processing power of 102 GOPS at only 600 mW is also 24,000x and 3,000x more energy efficient on respective CPUs and GPUs.

network model gpu dataset embedded dnn neural network network mobile hardware

comments powered by Disqus

TensorFlow: Machine learning for everyone

TensorFlow: Machine learning for everyone

November 17, 2019

TensorFlow is an open source software library for numerical computation with a focus on machine learning. Its flexible architecture makes it great for research and production deployment. Sherry Moore offers a high-level introduction to TensorFlow and explains how to use it to train machine-learning models to make your next application smarter.

TensorFlow: Machine learning for everyone

TensorFlow: Machine learning for everyone

October 20, 2019

TensorFlow is an open source software library for numerical computation with a focus on machine learning. Rajat Monga offers an introduction to TensorFlow and explains how to use it to train and deploy machine-learning models to make your next application smarter.

Chainer: A flexible and intuitive framework for complex neural networks

Chainer: A flexible and intuitive framework for complex neural networks

December 6, 2019

Open source software frameworks are the key for applying deep learning technologies. Orion Wolfe and Shohei Hido introduce Chainer, a Python-based standalone framework that enables users to intuitively implement many kinds of other models, including recurrent neural networks, with a lot of flexibility and comparable performance to GPUs.

Deeply active learning: Approximating human learning with smaller datasets combined with human assistance

Deeply active learning: Approximating human learning with smaller datasets combined with human assistance

December 6, 2019

Natural-language assistants are the emergent killer app for AI. Getting from here to there with deep learning, however, can require enormous datasets. Christopher Nguyen and Binh Han explain how to shorten the time to effectiveness and the amount of training data that's required to achieve a given level of performance using human-in-the-loop active learning.

TensorFlow: Large-scale analytics and distributed machine learning with TensorFlow, BigQuery, and Dataflow (Apache Beam)

TensorFlow: Large-scale analytics and distributed machine learning with TensorFlow, BigQuery, and Dataflow (Apache Beam)

October 20, 2019

Kazunori Sato and Amy Unruh explore how you can use TensorFlow to drive large-scale distributed machine learning against your analytic data sitting in Google BigQuery, with data preprocessing driven by Dataflow (now Apache Beam). Kazunori and Amy dive into practical examples of how these technologies can work together to enable a powerful workflow for distributed machine learning.

End-to-end learning for autonomous driving

End-to-end learning for autonomous driving

December 6, 2019

Urs Muller presents the architecture and training methods used to build an autonomous road-following system. A key aspect of the approach is eliminating the need for hand-programmed rules and procedures such as finding lane markings, guardrails, or other cars, thereby avoiding the creation of a large number of if, then, else statements.