January 5, 2020

391 words 2 mins read

Automating cloud cluster deployment: Beyond the book

Automating cloud cluster deployment: Beyond the book

Speed and reliability in deploying big data clusters is key for effectiveness in the cloud. Drawing on ideas from his book Moving Hadoop to the Cloud, which covers essential practices like baking images and automating cluster configuration, Bill Havanki explains how you can automate the creation of new clusters from scratch and use metrics gathered using the cloud provider to scale up.

Talk Title Automating cloud cluster deployment: Beyond the book
Speakers Bill Havanki (Cloudera)
Conference Strata Data Conference
Conf Tag Make Data Work
Location New York, New York
Date September 26-28, 2017
URL Talk Page
Slides Talk Slides
Video

Often, when an organization first ventures into the cloud for running Hadoop clusters, it carries over practices that worked well on-premises, along with the idea that each cluster should last a long time and be carefully tended. It soon becomes apparent that there is a different, perhaps more effective way: deploying clusters on demand, scaling them as needed, and destroying them to save costs when demand slackens. The problem is that it’s a lot of work to deploy a cluster in the cloud. There’s still the usual installation and configuration for all of the cluster services, but now you also need to think about allocating instances, placing them into your virtual networks, setting up security, creating new accounts, and more. How can all of that be done quickly enough to support an agile system of cloud cluster management? Drawing on ideas from his book Moving Hadoop to the Cloud, Bill Havanki explains how you can automate the creation of new clusters from scratch and use metrics gathered using the cloud provider to scale up. Moving Hadoop to the Cloud covers many of the techniques you need, including creating instance images with most of your work baked in ahead of time, using automation to handle the rest of the work, and devising your own cloud-based metrics tailored to Hadoop clusters that inform you when your cluster could use more resources. Bill then takes you even further, demonstrating how to automate the creation of entire clusters, relying on the cloud provider API and your own scripting to make it happen. Once you can automatically create new clusters, you can also trigger similar actions from your metrics to scale your cluster up in response to demand, fully harnessing cloud flexibility for effective cluster management.

comments powered by Disqus