Running data analytic workloads in the cloud
Vinithra Varadharajan, Jason Wang, Eugene Fratkin, and Mael Ropars detail new paradigms to effectively run production-level pipelines with minimal operational overhead. Join in to learn how to remove barriers to data discovery, metadata sharing, and access control.
Talk Title | Running data analytic workloads in the cloud |
Speakers | Eugene Fratkin (Cloudera), Vinithra Varadharajan (Cloudera), Mael Ropars (Cloudera), Jason Wang (Cloudera) |
Conference | Strata Data Conference |
Conf Tag | Making Data Work |
Location | London, United Kingdom |
Date | May 22-24, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Over the past several years, ever-increasing quantities of data are being processed within public clouds. The cloud promises to provide solutions to some of the limitations of conventional single multipurpose clusters offering hyperscale storage decoupled from elastic, on-demand compute and allows data to be shared between on-demand provisioned processing engines such as Hive, Spark, and Impala. But to fulfill this promise, you first need to solve several technical challenges: simple resource allocation, cross-cluster metadata sharing, and a common authorization framework. Without comprehensive answers to these questions, the challenges of single cluster model are simply duplicated inside a public cloud environment. The cloud enables the delivery of solutions to single, multipurpose clusters offering hyperscale storage decoupled from elastic, on-demand computing. Vinithra Varadharajan, Jason Wang, Eugene Fratkin, and Mael Ropars detail new paradigms to effectively run production-level pipelines with minimal operational overhead. As a part of the deep dive, they also walk you through creating such a pipeline and executing data processing and data analytic workflows. Join in to learn how to remove barriers to data discovery, metadata sharing, and access control.