January 14, 2020

299 words 2 mins read

Big data analytics in the public cloud: Challenges and opportunities

Big data analytics in the public cloud: Challenges and opportunities

Jian Zhang, Chendi Xue, and Yuan Zhou explore the challenges of migrating big data analytics workloads to the public cloud (e.g., performance lost and missing features) and demonstrate how to use a new in-memory data accelerator leveraging persistent memory and RDMA NICs to resolve this issues and enable new opportunities for big data workloads on the cloud.

Talk Title Big data analytics in the public cloud: Challenges and opportunities
Speakers Jian Zhang (Intel), Chendi Xue (Intel), Yuan Zhou (Intel)
Conference Strata Data Conference
Conf Tag Making Data Work
Location London, United Kingdom
Date April 30-May 2, 2019
URL Talk Page
Slides Talk Slides
Video

Cloud-based big data analytics is growing faster than traditional on-premises solutions, as it provides excellent scalability, simplifies management, and reduces costs. Public cloud adoption has become the top priority for big data investments. However, performance and feature gaps still exist that must be resolved. Jian Zhang, Chendi Xue, and Yuan Zhou explore the performance and feature challenges caused by migrating big data analytics workloads to the cloud, including disaggregated object storage commonly used by public CSPs, cloud connectors for big data and the cloud, and compute service orchestration (e.g., running Spark on Kubernetes). They then share the evolution of big data analytics in the public cloud, reveal the root cause of performance gaps of typical workloads (TeraSort, DFSIO, TPC-DS, and k-means) in different scenarios. They conclude with a discussion of a new in-memory data accelerator: high-performance layer leveraging state-of-the-art technologies like persistent memory and RDMA to accelerate ephemeral data access. You’ll see promising performance numbers on prototypes that illustrate how this approach enables hybrid transactional analytical processing (HTAP) workloads in the cloud. Along the way, you’ll learn how to leverage new hardware technologies like persistent memory and RDMA for big data analytics in the cloud.

comments powered by Disqus