Data science at eHarmony: A generalized framework for personalization
Data science has always been a focus at eHarmony, but recently more business units have needed data-driven models. Jonathan Morra introduces Aloha, an open source project that allows the modeling group to quickly deploy type-safe accurate models to production, and explores how eHarmony creates models with Apache Spark and how it uses them.
|Talk Title||Data science at eHarmony: A generalized framework for personalization|
|Conference||Strata + Hadoop World|
|Conf Tag||Make Data Work|
|Location||New York, New York|
|Date||September 27-29, 2016|
eHarmony has been using machine learning for about eight years. During this time, eHarmony has learned a number of lessons about how to implement machine learning at scale that allow it to rapidly address problems accurately. Recently more business units have needed data-driven models. Jonathan Morra introduces Aloha, an open source project that allows the modeling group to quickly deploy type-safe accurate models to production, and explores how eHarmony creates models with Apache Spark and how it uses them. Jonathan first explains why it’s so important for data scientists and engineers to work together, outlining specific real-world problems that can arise when they don’t work. Jonathan then builds the case for a unified modeling framework with feature extraction built into the model representation and introduces eHarmony’s open source modeling framework, Aloha, demonstrating how Aloha lets eHarmony define a common interface between engineering and data science that allows rapid and, more importantly, separate paces on both sides. Jonathan also explores how eHarmony makes use of Apache Spark to rapidly train, validate, test, and deploy models automatically and offers an aside into spotz, the hyperparameter optimization tool eHarmony has created and open sourced, giving the audience a taste of how eHarmony uses engineering on the modeling side to train models using a large amount of data. Finally, Jonathan takes a deep dive into eHarmony’s matching algorithm and discusses recent advancements in predicting user behavior. Jonathan then goes over how eHarmony uses contextual bandits to help users get the best matching experience everyday and touches on a very recently observed phenomena where eHarmony is able to get a significant lift in matching by training on an intermediate signal. Jonathan will also discuss some open research questions at eHarmony that the team is currently working to address.