February 14, 2020

263 words 2 mins read

Executive Briefing: What it takes to use machine learning in fast data pipelines

Executive Briefing: What it takes to use machine learning in fast data pipelines

Dean Wampler dives into how (and why) to integrate ML into production streaming data pipelines and to serve results quickly; how to bridge data science and production environments with different tools, techniques, and requirements; how to build reliable and scalable long-running services; and how to update ML models without downtime.


Talk Title	Executive Briefing: What it takes to use machine learning in fast data pipelines
Speakers	Dean Wampler (Anyscale)
Conference	Strata Data Conference
Conf Tag	Make Data Work
Location	New York, New York
Date	September 24-26, 2019
URL	Talk Page
Slides	Talk Slides
Video

Dean Wampler helps you develop a conceptual understanding of the challenges faced by your teams as they develop and deploy machine learning (ML) and artificial intelligence (AI) services integrated with fast data (streaming) pipelines. While combining these technologies is challenging, the benefits include timely delivery of innovative services to your customers. You’ll gain a brief overview of the business justification for integrating ML and AI and streaming as well as the ML and AI scenarios that are best delivered through streaming. Dean walks you through the main challenges when using these technologies together; ways to bridge the gap between data science and production teams, their tools, methods, and sometimes conflicting goals, for example, the exploration of ideas and optimal scoring results versus production reliability and efficiency; streaming ML and AI services must run reliably and handle variable loads for a long time, requiring you to leverage best practices from the microservices world; and updating models in the streaming application before they become stale without downtime and other practical problems.

intelligence intel reliability streaming microservice ml data science ai machine learning pipeline artificial intelligence

comments powered by Disqus

Executive Briefing: Unpacking AutoML

Executive Briefing: Unpacking AutoML

February 5, 2020

Paco Nathan outlines the history and landscape for vendors, open source projects, and research efforts related to AutoML. Starting from the perspective of an AI expert practitioner who speaks business fluently, Paco unpacks the ground truth of AutoMLtranslating from the hype into business concerns and practices in a vendor-neutral way.

Introducing Kubeflow (with special guests TensorFlow and Apache Spark)

Introducing Kubeflow (with special guests TensorFlow and Apache Spark)

February 4, 2020

Modeling is easyproductizing models, less so. Distributed training? Forget about it. Say hello to Kubeflow with Holden Karaua system that makes it easy for data scientists to containerize their models to train and serve on Kubernetes.

Executive Briefing: What it takes to use machine learning in fast data pipelines

Executive Briefing: What it takes to use machine learning in fast data pipelines

January 11, 2020

Your team is building machine learning capabilities. Dean Wampler demonstrates how to integrate these capabilities in streaming data pipelines so you can leverage the results quickly and update them as needed and covers challenges such as how to build long-running services that are very reliable and scalable and how to combine a spectrum of very different tools, from data science to operations.

Executive Briefing: What it takes to use machine learning in fast data pipelines

Executive Briefing: What it takes to use machine learning in fast data pipelines

December 25, 2019

Your team is building machine learning capabilities. Dean Wampler demonstrates how to integrate these capabilities in streaming data pipelines so you can leverage the results quickly and update them as needed and covers challenges such as how to build long-running services that are very reliable and scalable and how to combine a spectrum of very different tools, from data science to operations.

Scalable anomaly detection with Spark and SOS

Scalable anomaly detection with Spark and SOS

February 10, 2020

Jeroen Janssens dives into stochastic outlier section (SOS), an unsupervised algorithm for detecting anomalies in large, high-dimensional data. SOS has been implemented in Python, R, and, most recently, Spark. He illustrates the idea and intuition behind SOS, demonstrates the implementation of SOS on top of Spark, and applies SOS to a real-world use case.

Staying safe in the AI era

Staying safe in the AI era

February 10, 2020

Machine learning and artificial intelligence are no longer science fiction, so now you have to address what it takes to harness their potential effectively, responsibly, and reliably. Based on lessons learned at Google, Cassie Kozyrkov offers actionable advice to help you find opportunities to take advantage of machine learning, navigate the AI era, and stay safe as you innovate.