Meta-data science: When all the world's data scientists are just not enough
What if you had to build more models than there are data scientists in the worlda feat enterprise companies serving hundreds of thousands of businesses often have to do? Leah McGuire offers an overview of Salesforce's general-purpose machine-learning platform that automatically builds per-company optimized models for any given predictive problem at scale, beating out most hand-tuned models.
Talk Title | Meta-data science: When all the world's data scientists are just not enough |
Speakers | Leah McGuire (Salesforce) |
Conference | Strata Data Conference |
Conf Tag | Making Data Work |
Location | London, United Kingdom |
Date | May 23-25, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Due to privacy concerns and the nature of SaaS businesses, platforms like CRM systems often have to provide intelligent data-driven features that are built from many different unique, per-customer machine-learned models. In the case of Salesforce, this entails building hundreds of thousands of models tuned for as many distinctly different customers for any given data-driven application. Leah McGuire offers an overview of Salesforce’s Einstein, a homegrown Spark ML-based machine-learning platform. Einstein’s automated feature engineering results in much quicker modeling turnarounds and higher accuracy than general-purpose modeling libraries such as scikit-learn; its automatic hyperparameter optimization, feature selection, and model selection result in a very good model for each specific customer; it includes modular workflows and transformations that complement systems like Spark ML and KeystoneML; and it offers huge scale that enables training thousands of models per day.