January 25, 2020

325 words 2 mins read

Continuous machine learning over streaming data: The story continues.

Continuous machine learning over streaming data: The story continues.

Roger Barga, Sudipto Guha, and Kapil Chhabra explain how unsupervised learning with the robust random cut forest (RRCF) algorithm enables insights into streaming data and share new applications to impute missing values, forecast future values, detect hotspots, and perform classification tasks. They also demonstrate how to implement unsupervised learning over massive data streams.

Talk Title Continuous machine learning over streaming data: The story continues.
Speakers Roger Barga (Amazon Web Services), Sudipto Guha (Amazon Web Services), Kapil Chhabra (Amazon Web Services )
Conference Strata Data Conference
Conf Tag Make Data Work
Location New York, New York
Date September 11-13, 2018
URL Talk Page
Slides Talk Slides
Video

Roger Barga, Sudipto Guha, and Kapil Chhabra explain how unsupervised learning with the robust random cut forest (RRCF) algorithm enables insights into streaming data and share new applications to impute missing values, forecast future values, detect hotspots, and perform classification tasks. They also demonstrate how to implement unsupervised learning over massive data streams. In this extension of their talk at Strata San Jose 2018, where they first presented the RRCF algorithm—which maintains an efficient sketch of a data stream and continuously adapts (learns) each time it sees a new data record—Roger, Sudipto, and Kapil discuss new applications and results, including implementation details. After briefly introducing the RRCF algorithm, they present its application to impute missing values in a data stream. They then detail its application to forecast future values, when the stream is a time series of data, and describe how the RRCF algorithm can be used to detect emerging hotspots in a data stream and perform multiclass classification over streaming data. For each application of the RRCF, Roger, Sudipto, and Kapil present an actual customer use case along with the results of experiments that compare RRCF application with best-in-class methods. They conclude with a deep dive into the efficient implementation the RRCF algorithm that enables it to operate and continuously learn in real time over massive data streams.

comments powered by Disqus