Big data analytics in the public cloud: Challenges and opportunities
Jian Zhang, Chendi Xue, and Yuan Zhou explore the challenges of migrating big data analytics workloads to the public cloud (e.g., performance lost and missing features) and demonstrate how to use a new in-memory data accelerator leveraging persistent memory and RDMA NICs to resolve this issues and enable new opportunities for big data workloads on the cloud.
Talk Title | Big data analytics in the public cloud: Challenges and opportunities |
Speakers | Jian Zhang (Intel), Chendi Xue (Intel), Yuan Zhou (Intel) |
Conference | Strata Data Conference |
Conf Tag | Making Data Work |
Location | London, United Kingdom |
Date | April 30-May 2, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Cloud-based big data analytics is growing faster than traditional on-premises solutions, as it provides excellent scalability, simplifies management, and reduces costs. Public cloud adoption has become the top priority for big data investments. However, performance and feature gaps still exist that must be resolved. Jian Zhang, Chendi Xue, and Yuan Zhou explore the performance and feature challenges caused by migrating big data analytics workloads to the cloud, including disaggregated object storage commonly used by public CSPs, cloud connectors for big data and the cloud, and compute service orchestration (e.g., running Spark on Kubernetes). They then share the evolution of big data analytics in the public cloud, reveal the root cause of performance gaps of typical workloads (TeraSort, DFSIO, TPC-DS, and k-means) in different scenarios. They conclude with a discussion of a new in-memory data accelerator: high-performance layer leveraging state-of-the-art technologies like persistent memory and RDMA to accelerate ephemeral data access. You’ll see promising performance numbers on prototypes that illustrate how this approach enables hybrid transactional analytical processing (HTAP) workloads in the cloud. Along the way, you’ll learn how to leverage new hardware technologies like persistent memory and RDMA for big data analytics in the cloud.