December 30, 2019

182 words 1 min read

Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML

Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML

Brooke Wenig introduces you to Apache Spark 2.0 core concepts with a focus on Spark's machine learning library, using text mining on real-world data as the primary end-to-end use case.

Talk Title Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML
Speakers Brooke Wenig (Databricks)
Conference Strata Data Conference
Conf Tag Make Data Work
Location New York, New York
Date September 26-28, 2017
URL Talk Page
Slides Talk Slides
Video

Brooke Wenig introduces you to Apache Spark 2.0 core concepts with a focus on Spark’s machine learning library, using text mining on real-world data as the primary end-to-end use case. Join in to explore and wrangle data using Spark’s DataSet and DataFrame abstractions. You’ll use the Spark ML API to build an ML pipeline to transform free text into useful features via Spark ML’s Transformer abstraction (e.g., one-hot encoding and term frequency counting) and learn about model selection, training/fitting, and validation/inspection, as well as parameter tuning with grid search parameter selection. The class will consist of approximately 50% hands-on programming labs in Scala and 50% lecture and discussion.

comments powered by Disqus