Running multidisciplinary big data workloads in the cloud
Attend this tutorial to learn how to successfully run a data analytics pipeline in the cloud and integrate data engineering and data analytic workflows and explore considerations and best practices for data analytics pipelines in the cloud. Along the way, you'll see how to share metadata across workloads in a big data PaaS.
Talk Title | Running multidisciplinary big data workloads in the cloud |
Speakers | Sudhanshu Arora (Cloudera), Stefan Salandy (Cloudera), Suraj Acharya (Cloudera), Brandon Freeman (Cloudera), Jason Wang (Cloudera), Shravan Pabba (Cloudera) |
Conference | Strata Data Conference |
Conf Tag | Make Data Work |
Location | New York, New York |
Date | September 11-13, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Organizations now run diverse, multidisciplinary big data workloads that span data engineering, analytic database, and data science applications. Many of these workloads operate on the same underlying data, and the workloads themselves can be transient or long running in nature. One of the challenges is keeping the data context consistent across these various workloads. Sudhanshu Arora, Stefan Salandy, Suraj Acharya, Brandon Freeman, Jason Wang, and Shravan Pabba demonstrate how to successfully manage the shared data experience to ensure a consistent experience across all various workloads. You’ll learn how to successfully run a data analytics pipeline in the cloud and integrate data engineering and data analytic workflows and explore considerations and best practices for data analytics pipelines in the cloud. Along the way, you’ll see how to share metadata across workloads in a big data PaaS. You’ll use the Cloudera Altus PaaS offering, powered by Cloudera Altus SDX, to run various big data workloads.