Automating cloud cluster deployment: Beyond the book
Speed and reliability in deploying big data clusters is key for effectiveness in the cloud. Drawing on ideas from his book Moving Hadoop to the Cloud, which covers essential practices like baking images and automating cluster configuration, Bill Havanki explains how you can automate the creation of new clusters from scratch and use metrics gathered using the cloud provider to scale up.
Talk Title | Automating cloud cluster deployment: Beyond the book |
Speakers | Bill Havanki (Cloudera) |
Conference | Strata Data Conference |
Conf Tag | Make Data Work |
Location | New York, New York |
Date | September 26-28, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Often, when an organization first ventures into the cloud for running Hadoop clusters, it carries over practices that worked well on-premises, along with the idea that each cluster should last a long time and be carefully tended. It soon becomes apparent that there is a different, perhaps more effective way: deploying clusters on demand, scaling them as needed, and destroying them to save costs when demand slackens. The problem is that it’s a lot of work to deploy a cluster in the cloud. There’s still the usual installation and configuration for all of the cluster services, but now you also need to think about allocating instances, placing them into your virtual networks, setting up security, creating new accounts, and more. How can all of that be done quickly enough to support an agile system of cloud cluster management? Drawing on ideas from his book Moving Hadoop to the Cloud, Bill Havanki explains how you can automate the creation of new clusters from scratch and use metrics gathered using the cloud provider to scale up. Moving Hadoop to the Cloud covers many of the techniques you need, including creating instance images with most of your work baked in ahead of time, using automation to handle the rest of the work, and devising your own cloud-based metrics tailored to Hadoop clusters that inform you when your cluster could use more resources. Bill then takes you even further, demonstrating how to automate the creation of entire clusters, relying on the cloud provider API and your own scripting to make it happen. Once you can automatically create new clusters, you can also trigger similar actions from your metrics to scale your cluster up in response to demand, fully harnessing cloud flexibility for effective cluster management.