Introducing KFServing: Serverless Model Serving on Kubernetes
Production-grade serving of ML models is a challenging task for data scientists. In this talk, we'll discuss how KFServing powers some real-world examples of inference in production at Bloomberg, whic …
Talk Title | Introducing KFServing: Serverless Model Serving on Kubernetes |
Speakers | Dan Sun (Senior Software Engineer, Bloomberg), Ellis Bigelow (Software Engineer, Google) |
Conference | KubeCon + CloudNativeCon North America |
Conf Tag | |
Location | San Diego, CA, USA |
Date | Nov 15-21, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Production-grade serving of ML models is a challenging task for data scientists. In this talk, we’ll discuss how KFServing powers some real-world examples of inference in production at Bloomberg, which supports the business domains of NLP, computer vision, and time-series analysis. KFServing (https://github.com/kubeflow/kfserving) provides a Kubernetes CRD for serving ML models on arbitrary frameworks. It aims to solve 80% of model serving use cases by providing performant, high abstraction interfaces for common ML frameworks. It provides a consistent and richly featured abstraction that supports bleeding-edge serving features like CPU/GPU auto-scaling, scale to and from 0, and canary rollouts. KFServing’s charter includes a rich roadmap to fulfill a complete story for mission critical ML, including inference graphs, model explainability, outlier detection, and payload logging.