Data science at eHarmony: A generalized framework for personalization

Data science has always been a focus at eHarmony, but recently more business units have needed data-driven models. Jonathan Morra introduces Aloha, an open source project that allows the modeling group to quickly deploy type-safe accurate models to production, and explores how eHarmony creates models with Apache Spark and how it uses them.


Talk Title	Data science at eHarmony: A generalized framework for personalization
Speakers
Conference	Strata + Hadoop World
Conf Tag	Make Data Work
Location	New York, New York
Date	September 27-29, 2016
URL	Talk Page
Slides	Talk Slides
Video

eHarmony has been using machine learning for about eight years. During this time, eHarmony has learned a number of lessons about how to implement machine learning at scale that allow it to rapidly address problems accurately. Recently more business units have needed data-driven models. Jonathan Morra introduces Aloha, an open source project that allows the modeling group to quickly deploy type-safe accurate models to production, and explores how eHarmony creates models with Apache Spark and how it uses them. Jonathan first explains why it’s so important for data scientists and engineers to work together, outlining specific real-world problems that can arise when they don’t work. Jonathan then builds the case for a unified modeling framework with feature extraction built into the model representation and introduces eHarmony’s open source modeling framework, Aloha, demonstrating how Aloha lets eHarmony define a common interface between engineering and data science that allows rapid and, more importantly, separate paces on both sides. Jonathan also explores how eHarmony makes use of Apache Spark to rapidly train, validate, test, and deploy models automatically and offers an aside into spotz, the hyperparameter optimization tool eHarmony has created and open sourced, giving the audience a taste of how eHarmony uses engineering on the modeling side to train models using a large amount of data. Finally, Jonathan takes a deep dive into eHarmony’s matching algorithm and discusses recent advancements in predicting user behavior. Jonathan then goes over how eHarmony uses contextual bandits to help users get the best matching experience everyday and touches on a very recently observed phenomena where eHarmony is able to get a significant lift in matching by training on an intermediate signal. Jonathan will also discuss some open research questions at eHarmony that the team is currently working to address.

Data science at eHarmony: A generalized framework for personalization

Stream analytics in the enterprise: A look at Intels internal IoT implementation

Stream analytics in the enterprise: A look at Intels internal IoT implementation

Scala and the JVM as a big data platform: Lessons from Apache Spark

The future of column-oriented data processing with Arrow and Parquet

Using graph databases to operationalize insights from big data

AI is not a matter of strength but of intelligence