February 20, 2020

222 words 2 mins read

Creating smaller, faster, production-worthy mobile machine learning models

Creating smaller, faster, production-worthy mobile machine learning models

Getting machine learning models ready for use on device is a major challenge. Drag-and-drop training tools can get you started, but the models they produce arent small enough or fast enough to ship. Jameson Toole walks you through optimization, pruning, and compression techniques to keep app sizes small and inference speeds high.

Talk Title Creating smaller, faster, production-worthy mobile machine learning models
Speakers Jameson Toole (Fritz AI)
Conference O’Reilly Artificial Intelligence Conference
Conf Tag Put AI to Work
Location London, United Kingdom
Date October 15-17, 2019
URL Talk Page
Slides Talk Slides
Video

Getting machine learning models ready for use on device is a major challenge. Drag-and-drop training tools can get you started, but the models they produce aren’t small enough or fast enough to ship. Jameson Toole walks you through optimization, pruning, and compression techniques to keep app sizes small and inference speeds high. Jameson explores flexible model architectures that meet performance and accuracy requirements across devices and platforms. You’ll discover pruning and distillation techniques to optimize model performance and quantization tools to compress models to a fraction of their original size. Jameson gives you a practical example of this process as he creates an artistic style transfer model that’s just 17 kb. All of these techniques are applied to mobile machine learning frameworks such as Core ML and TensorFlow Lite.

comments powered by Disqus