February 24, 2020

194 words 1 min read

Advanced model deployments with TensorFlow Serving

Advanced model deployments with TensorFlow Serving

Hannes Hapke leads a deep dive into deploying TensorFlow models within minutes with TensorFlow Serving and optimizing your serving infrastructure for optimal throughput.


Talk Title	Advanced model deployments with TensorFlow Serving
Speakers	Hannes Hapke (SAP ConcurLabs)
Conference	O’Reilly TensorFlow World
Conf Tag
Location	Santa Clara, California
Date	October 28-31, 2019
URL	Talk Page
Slides	Talk Slides
Video

TensorFlow Serving is one of the cornerstones in the TensorFlow ecosystem. It has eased the deployment of machine learning models tremendously and led to an acceleration of model deployments. Unfortunately, machine learning engineers aren’t familiar with the details of TensorFlow Serving, and they’re missing out on significant performance increases. Hannes Hapke provides a brief introduction to TensorFlow Serving, then leads a deep dive into advanced settings and use cases. He introduces advanced concepts and implementation suggestions to increase the performance of the TensorFlow Serving setup, which includes an introduction to how clients can request model meta-information from the model server, an overview of model optimization options for optimal prediction throughput, an introduction to batching requests to improve the throughput performance, an example implementation to support model A/B testing, and an overview of monitoring your TensorFlow Serving setup.

prediction introduction ecosystem tensorflow monitoring use case machine learning performance optimization

comments powered by Disqus

Unlocking your serverless functions with OpenFaaS for AI chatbot projects

Unlocking your serverless functions with OpenFaaS for AI chatbot projects

January 23, 2020

Sergio Mendez examines critical challenges when implementing AI chatbots and explains how Movistar designed an open source serverless architecture using OpenFaaS on top of Kubernetes and other complementary technologies like NoSQL, brokers to deploy Telegram AI chatbots. Sergio then compares these technologies to "vendor lock-in" services offered by major cloud providers.

Unleashing Apache Kafka and TensorFlow in hybrid architectures

Unleashing Apache Kafka and TensorFlow in hybrid architectures

January 4, 2020

How do you leverage the flexibility and extreme scale of the public cloud and the Apache Kafka ecosystem to build scalable, mission-critical machine learning infrastructures that span multiple public cloudsor bridge your on-premises data center to the cloud? Join Kai Whner to learn how to use technologies such as TensorFlow with Kafkas open source ecosystem for machine learning infrastructures.

Accelerating training, inference, and ML applications on NVIDIA GPUs

Accelerating training, inference, and ML applications on NVIDIA GPUs

February 24, 2020

Maggie Zhang, Nathan Luehr, Josh Romero, Pooya Davoodi, and Davide Onofrio give you a sneak peek at software components from NVIDIAs software stack so you can get the best out of your end-to-end AI applications on modern NVIDIA GPUs. They also examine features and tips and tricks to optimize your workloads right from data loading, processing, training, inference, and deployment.

HARP: An efficient and elastic GPU-sharing system

HARP: An efficient and elastic GPU-sharing system

February 23, 2020

Pengfei Fan and Lingling Jin offer an overview of an efficient and elastic GPU-sharing system for users who do research and development with TensorFlow.

Introduction to Hilbert AutoML with TensorFlow Extended (TFX) at Yahoo! JAPAN

Introduction to Hilbert AutoML with TensorFlow Extended (TFX) at Yahoo! JAPAN

February 23, 2020

Hilbert is an AI framework that works with TensorFlow Extended (TFX) at Yahoo! JAPAN, which provides AutoML to create production-level deep learning models automatically. Hilbert is currently used by over 20 services of Yahoo! JAPAN. Shin-Ichiro Okamoto details how to achieve production-level AutoML and explores service use cases at Yahoo! JAPAN.

Creating smaller, faster, production-worthy mobile machine learning models

Creating smaller, faster, production-worthy mobile machine learning models

February 20, 2020

Getting machine learning models ready for use on device is a major challenge. Drag-and-drop training tools can get you started, but the models they produce arent small enough or fast enough to ship. Jameson Toole walks you through optimization, pruning, and compression techniques to keep app sizes small and inference speeds high.