January 26, 2020

361 words 2 mins read

Bighead: Airbnb's end-to-end machine learning platform

Bighead: Airbnb's end-to-end machine learning platform

Atul Kale and Xiaohan Zeng offer an overview of Bighead, Airbnb's user-friendly and scalable end-to-end machine learning framework that powers Airbnb's data-driven products. Built on Python, Spark, and Kubernetes, Bighead integrates popular libraries like TensorFlow, XGBoost, and PyTorch and is designed be used in modular pieces.

Talk Title Bighead: Airbnb's end-to-end machine learning platform
Speakers Atul Kale (Airbnb), Xiaohan Zeng (Airbnb)
Conference Strata Data Conference
Conf Tag Make Data Work
Location New York, New York
Date September 11-13, 2018
URL Talk Page
Slides Talk Slides
Video

Airbnb’s data-driven products present a wide variety of unique ML problems, ranging from traditional models built on structured data to state-of-the-art models that leverage unstructured data, such as user reviews, messages, and images. The ability to build, iterate on, and maintain healthy machine learning models is critical to Airbnb’s success. An end-to-end solution typically needs to cover data collection, feature engineering, training, deploying, serving, and monitoring. Presently, few platforms are capable of doing all of the above in a user-friendly way. Moreover, the heterogeneous nature of ML problems and the requirement of scalability pose challenges to fast iteration and productionization. Atul Kale and Xiaohan Zeng offer an overview of Bighead, Airbnb’s user-friendly and scalable end-to-end machine learning framework that powers Airbnb’s data-driven products. Bighead is built on Python, Spark, and Kubernetes. The components include a lifecycle management service, an offline training and inference engine, an online inference service, a prototyping environment, and a Docker image customization tool. Each component can be used individually. In addition, Bighead includes a unified model building API that smoothly integrates popular libraries including TensorFlow, XGBoost, and PyTorch. Each model is reproducible and iterable through standardization of data collection and transformation, model training environments, and production deployment. Atul and Xiaohan explore Bighead’s architecture, detail the problems that each individual component and the overall system aim to solve, and outline a vision for the future of machine learning infrastructure. Bighead is widely adopted at Airbnb, with a variety of models in production, and has enabled the company to reduce model development time from months to days. Airbnb plans to open source Bighead to allow the broader community to benefit from this work.

comments powered by Disqus