December 5, 2019

300 words 2 mins read

The cloud is expensive, so build your own redundant Hadoop clusters.

The cloud is expensive, so build your own redundant Hadoop clusters.

Criteo has a production cluster of 2K nodes running over 300K jobs a day in the company's own data centers. These clusters were meant to provide a redundant solution to Criteo's storage and compute needs. Stuart Pook offers an overview of the project, shares challenges and lessons learned, and discusses Criteo's progress in building another cluster to survive the loss of a full DC.

Talk Title The cloud is expensive, so build your own redundant Hadoop clusters.
Speakers Stuart Pook (Criteo)
Conference Strata Data Conference
Conf Tag Making Data Work
Location London, United Kingdom
Date May 22-24, 2018
URL Talk Page
Slides Talk Slides
Video

Criteo has a main production cluster of 2,000 nodes that runs over 300,000 jobs a day, along with a backup cluster of 1,200 nodes. Criteo’s job is to keep these clusters running together as it builds a cluster to replace the backup cluster. These clusters are in the company’s own data centers, as running in the cloud would be many times more expensive. These two clusters were meant to provide a redundant solution to Criteo’s storage and compute needs, including a tested failover mechanism. Building a cluster requires testing the hardware from several manufacturers and choosing the most cost effective option. Stuart Pook offers an overview of the project, shares challenges and lessons learned, and discusses Criteo’s progress in building another cluster to survive the loss of a full DC. Criteo has now done these tests twice and can provide advice on how to do it right the first time. The tests were effective except for the RAID controller for the company’s 35,000 disks. Criteo had so many problems using the new controller that it had to replace it and is now working on a solution that will help the company better manage its disks.

comments powered by Disqus