January 31, 2020

203 words 1 min read

Building machine learning inference pipelines at scale

Building machine learning inference pipelines at scale

Real-life ML workloads require more than training and predicting: data often needs to be preprocessed and postprocessed. Developers and data scientists have to train and deploy a sequence of algorithms that collaborate in delivering predictions from raw data. Julien Simon outlines how to build machine learning inference pipelines using open source libraries and how to scale them on AWS.


Talk Title	Building machine learning inference pipelines at scale
Speakers	Julien Simon (AWS)
Conference	O’Reilly Open Source Software Conference
Conf Tag	Fueling innovative software
Location	Portland, Oregon
Date	July 15-18, 2019
URL	Talk Page
Slides	Talk Slides
Video

Real-life ML workloads typically require more than training and predicting: data often needs to be preprocessed and postprocessed, sometimes in multiple steps. Thus, developers and data scientists have to train and deploy not just a single algorithm but a sequence of algorithms that will collaborate in delivering predictions from raw data. Julien Simon outlines how to use Apache Spark MLlib to build ML pipelines and discusses scaling options when datasets grow huge. As the cloud is a popular way to scale, he dives into how to how implement inference pipelines on AWS using Apache Spark and sci-kit learn, as well as ML algorithms implemented by Amazon.

prediction apache algorithm dataset spark ml aws machine learning cloud pipeline

comments powered by Disqus

Spark NLP in action: How Indeed applies NLP to standardize rsum content at scale

Spark NLP in action: How Indeed applies NLP to standardize rsum content at scale

January 6, 2020

Alexander Thomas and Alexis Yelton demonstrate how to use Spark NLP and Apache Spark to standardize semistructured text, illustrated by Indeed's standardization process for rsum content.

One-click deployment for containerized ML and DL environments

One-click deployment for containerized ML and DL environments

December 29, 2019

Nanda Vijaydev explains how to spin up instant ML/DL environments using containersall while ensuring enterprise-grade security and performance. Find out how to provide your data science teams with on-demand access to the tools and data they need, whether on-premises or in the cloud.

Building high-performance text classifiers on a limited labeling budget

Building high-performance text classifiers on a limited labeling budget

December 26, 2019

Robert Horton, Mario Inchiosa, and Ali Zaidi demonstrate how to use three cutting-edge machine learning techniquestransfer learning from pretrained language models, active learning to make more effective use of a limited labeling budget, and hyperparameter tuning to maximize model performanceto up your modeling game.

User-based real-time product recommendations leveraging deep learning using Analytics Zoo on Apache Spark and BigDL

User-based real-time product recommendations leveraging deep learning using Analytics Zoo on Apache Spark and BigDL

December 20, 2019

User-based real-time recommendation systems have become an important topic in ecommerce. Lu Wang, Nicole Kong, Guoqiong Song, and Maneesha Bhalla demonstrate how to build deep learning algorithms using Analytics Zoo with BigDL on Apache Spark and create an end-to-end system to serve real-time product recommendations.

Deep learning with TensorFlow and Spark using GPUs and Docker containers

Deep learning with TensorFlow and Spark using GPUs and Docker containers

January 12, 2020

Organizations need to keep ahead of their competition by using the latest AI, ML, and DL technologies such as Spark, TensorFlow, and H2O. The challenge is in how to deploy these tools and keep them running in a consistent manner while maximizing the use of scarce hardware resources, such as GPUs. Thomas Phelan discusses the effective deployment of such applications in a container environment.

Veo5G Project: A 5G Use Case for Network Data Analytics

Veo5G Project: A 5G Use Case for Network Data Analytics

January 6, 2020

This presentation will describe Gradiant's role in the VEO5G R&D project: provide data analytics for networking data in a 5G scenario.Gradiant researchers will share their experience working with the …