Creating smaller, faster, production-worthy mobile machine learning models
Getting machine learning models ready for use on device is a major challenge. Drag-and-drop training tools can get you started, but the models they produce arent small enough or fast enough to ship. Jameson Toole walks you through optimization, pruning, and compression techniques to keep app sizes small and inference speeds high.
Talk Title | Creating smaller, faster, production-worthy mobile machine learning models |
Speakers | Jameson Toole (Fritz AI) |
Conference | O’Reilly Artificial Intelligence Conference |
Conf Tag | Put AI to Work |
Location | London, United Kingdom |
Date | October 15-17, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Getting machine learning models ready for use on device is a major challenge. Drag-and-drop training tools can get you started, but the models they produce aren’t small enough or fast enough to ship. Jameson Toole walks you through optimization, pruning, and compression techniques to keep app sizes small and inference speeds high. Jameson explores flexible model architectures that meet performance and accuracy requirements across devices and platforms. You’ll discover pruning and distillation techniques to optimize model performance and quantization tools to compress models to a fraction of their original size. Jameson gives you a practical example of this process as he creates an artistic style transfer model that’s just 17 kb. All of these techniques are applied to mobile machine learning frameworks such as Core ML and TensorFlow Lite.