Deep neural network model compression and an efficient inference engine
December 6, 2019
Neural networks are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. Song Han explains how deep compression addresses this limitation by reducing the storage requirement of neural networks without affecting their accuracy and proposes an energy-efficient inference engine (EIE) that works with this model.