December 5, 2019

239 words 2 mins read

StreamDM: Advanced data science with Spark Streaming

StreamDM: Advanced data science with Spark Streaming

Heitor Murilo Gomes and Albert Bifet offer an overview of StreamDM, a real-time analytics open source software library built on top of Spark Streaming, developed at Huawei's Noahs Ark Lab and Tlcom ParisTech.

Talk Title StreamDM: Advanced data science with Spark Streaming
Speakers Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)
Conference Strata Data Conference
Conf Tag Making Data Work
Location London, United Kingdom
Date May 22-24, 2018
URL Talk Page
Slides Talk Slides
Video

Heitor Murilo Gomes and Albert Bifet offer an overview of StreamDM, a real-time analytics open source software library built on top of Spark Streaming, developed at Huawei’s Noah’s Ark Lab and Télécom ParisTech. StreamDM’s tools and algorithms are specifically designed for data streaming. Due to the large amount of data that is created—and must be processed—in real-time streams, such methods need to be extremely time efficient while using very small amounts of memory. StreamDM is the first library to include advanced stream-mining algorithms for Spark Streaming and is intended to be the open source gathering point for the research and implementation of data streams, while also allowing practical deployments on real-world datasets. This library contains methods for classification, regression, clustering, and frequent pattern mining. Heitor and Albert explain how these advanced methods work in practice, discuss some big data analytics applications in telecommunication networks, compare them with the methods available in MLlib and Spark ML, and demonstrate their ease of use and extensibility.

comments powered by Disqus