November 25, 2019

242 words 2 mins read

Hive as a service

Hive as a service

Hive is the main data transformation tool at Criteo, and hundreds of analysts and thousands of automated jobs run Hive queries every day. Szehon Ho and Pawel Szostek discuss the evolution of Criteo's Hive platform from an error-prone add-on installed on some spare machines to a best-in-class installation capable of self-healing and automatically scaling to handle its growing load.

Talk Title Hive as a service
Speakers Szehon Ho (Criteo), Pawel Szostek (Criteo)
Conference Strata Data Conference
Conf Tag Big Data Expo
Location San Jose, California
Date March 6-8, 2018
URL Talk Page
Slides Talk Slides
Video

Hive is the main data transformation tool at Criteo, and hundreds of analysts and thousands of automated jobs run Hive queries every day. Szehon Ho and Pawel Szostek discuss the evolution of Criteo’s Hive platform from an error-prone add-on installed on some spare machines to a best-in-class installation capable of self-healing and automatically scaling to handle its growing load. The resulting platform is based on Mesos. Mesos has allowed Criteo to scale per demand and better utilize resources, iterate on development much faster than on bare metal, and roll out new versions seamlessly without downtime for our users. Finally, it has allowed the company to eliminate the last SPOF in its Hive stack. Szehon and Pawel detail Criteo’s data architecture and explain how the company solved challenges in security, monitoring, scheduling, and load balancing on multiple layers. They also discuss the gains made by this process.

comments powered by Disqus