Building machine learning inference pipelines at scale
Real-life ML workloads require more than training and predicting: data often needs to be preprocessed and postprocessed. Developers and data scientists have to train and deploy a sequence of algorithms that collaborate in delivering predictions from raw data. Julien Simon outlines how to build machine learning inference pipelines using open source libraries and how to scale them on AWS.
Talk Title | Building machine learning inference pipelines at scale |
Speakers | Julien Simon (AWS) |
Conference | O’Reilly Open Source Software Conference |
Conf Tag | Fueling innovative software |
Location | Portland, Oregon |
Date | July 15-18, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Real-life ML workloads typically require more than training and predicting: data often needs to be preprocessed and postprocessed, sometimes in multiple steps. Thus, developers and data scientists have to train and deploy not just a single algorithm but a sequence of algorithms that will collaborate in delivering predictions from raw data. Julien Simon outlines how to use Apache Spark MLlib to build ML pipelines and discusses scaling options when datasets grow huge. As the cloud is a popular way to scale, he dives into how to how implement inference pipelines on AWS using Apache Spark and sci-kit learn, as well as ML algorithms implemented by Amazon.