Continuous machine learning over streaming data: The story continues.


Talk Title	Continuous machine learning over streaming data: The story continues.
Speakers	Roger Barga (Amazon Web Services), Sudipto Guha (Amazon Web Services), Kapil Chhabra (Amazon Web Services )
Conference	Strata Data Conference
Conf Tag	Make Data Work
Location	New York, New York
Date	September 11-13, 2018
URL	Talk Page
Slides	Talk Slides
Video

Roger Barga, Sudipto Guha, and Kapil Chhabra explain how unsupervised learning with the robust random cut forest (RRCF) algorithm enables insights into streaming data and share new applications to impute missing values, forecast future values, detect hotspots, and perform classification tasks. They also demonstrate how to implement unsupervised learning over massive data streams. In this extension of their talk at Strata San Jose 2018, where they first presented the RRCF algorithm—which maintains an efficient sketch of a data stream and continuously adapts (learns) each time it sees a new data record—Roger, Sudipto, and Kapil discuss new applications and results, including implementation details. After briefly introducing the RRCF algorithm, they present its application to impute missing values in a data stream. They then detail its application to forecast future values, when the stream is a time series of data, and describe how the RRCF algorithm can be used to detect emerging hotspots in a data stream and perform multiclass classification over streaming data. For each application of the RRCF, Roger, Sudipto, and Kapil present an actual customer use case along with the results of experiments that compare RRCF application with best-in-class methods. They conclude with a deep dive into the efficient implementation the RRCF algorithm that enables it to operate and continuously learn in real time over massive data streams.

Continuous machine learning over streaming data: The story continues.

Machine learning for nonstationary streaming data using Structured Streaming and StreamDM

Continuous machine learning over streaming data

Scaling data infrastructure in the fashion world; or, What is this? Business intelligence for ants?

Building deep reinforcement learning applications on BigDL and Spark

From flat files to deconstructed database: The evolution and future of the big data ecosystem

Lightning Talk: Artificial Intelligence the Next Digital Wave for Telcos