Introducing KFServing: Serverless Model Serving on Kubernetes
Production-grade serving of ML models is a challenging task for data scientists. In this talk, we'll discuss how KFServing powers some real-world examples of inference in production at Bloomberg, whic …
|Talk Title||Introducing KFServing: Serverless Model Serving on Kubernetes|
|Speakers||Dan Sun (Senior Software Engineer, Bloomberg), Ellis Bigelow (Software Engineer, Google)|
|Conference||KubeCon + CloudNativeCon North America|
|Location||San Diego, CA, USA|
|Date||Nov 15-21, 2019|
Production-grade serving of ML models is a challenging task for data scientists. In this talk, we’ll discuss how KFServing powers some real-world examples of inference in production at Bloomberg, which supports the business domains of NLP, computer vision, and time-series analysis. KFServing (https://github.com/kubeflow/kfserving) provides a Kubernetes CRD for serving ML models on arbitrary frameworks. It aims to solve 80% of model serving use cases by providing performant, high abstraction interfaces for common ML frameworks. It provides a consistent and richly featured abstraction that supports bleeding-edge serving features like CPU/GPU auto-scaling, scale to and from 0, and canary rollouts. KFServing’s charter includes a rich roadmap to fulfill a complete story for mission critical ML, including inference graphs, model explainability, outlier detection, and payload logging.