January 31, 2020

203 words 1 min read

Building machine learning inference pipelines at scale

Building machine learning inference pipelines at scale

Real-life ML workloads require more than training and predicting: data often needs to be preprocessed and postprocessed. Developers and data scientists have to train and deploy a sequence of algorithms that collaborate in delivering predictions from raw data. Julien Simon outlines how to build machine learning inference pipelines using open source libraries and how to scale them on AWS.

Talk Title Building machine learning inference pipelines at scale
Speakers Julien Simon (AWS)
Conference O’Reilly Open Source Software Conference
Conf Tag Fueling innovative software
Location Portland, Oregon
Date July 15-18, 2019
URL Talk Page
Slides Talk Slides
Video

Real-life ML workloads typically require more than training and predicting: data often needs to be preprocessed and postprocessed, sometimes in multiple steps. Thus, developers and data scientists have to train and deploy not just a single algorithm but a sequence of algorithms that will collaborate in delivering predictions from raw data. Julien Simon outlines how to use Apache Spark MLlib to build ML pipelines and discusses scaling options when datasets grow huge. As the cloud is a popular way to scale, he dives into how to how implement inference pipelines on AWS using Apache Spark and sci-kit learn, as well as ML algorithms implemented by Amazon.

comments powered by Disqus