Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML
Brooke Wenig introduces you to Apache Spark 2.0 core concepts with a focus on Spark's machine learning library, using text mining on real-world data as the primary end-to-end use case.
Talk Title | Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML |
Speakers | Brooke Wenig (Databricks) |
Conference | Strata Data Conference |
Conf Tag | Make Data Work |
Location | New York, New York |
Date | September 26-28, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Brooke Wenig introduces you to Apache Spark 2.0 core concepts with a focus on Spark’s machine learning library, using text mining on real-world data as the primary end-to-end use case. Join in to explore and wrangle data using Spark’s DataSet and DataFrame abstractions. You’ll use the Spark ML API to build an ML pipeline to transform free text into useful features via Spark ML’s Transformer abstraction (e.g., one-hot encoding and term frequency counting) and learn about model selection, training/fitting, and validation/inspection, as well as parameter tuning with grid search parameter selection. The class will consist of approximately 50% hands-on programming labs in Scala and 50% lecture and discussion.