HDFS CSI Plugin: Speed Up Kubernetes in On-Premises Big Data Cluster
Kubernetes not only becomes predominant in public cloud area these days, but also becomes a new trend in on-premises big data cluster environment, as an alternative of Hadoop YARN, a resource schedule …
Talk Title | HDFS CSI Plugin: Speed Up Kubernetes in On-Premises Big Data Cluster |
Speakers | Junping Du (Architect, Tencent), Yi Chen (Senior Software Engineer, Tencent) |
Conference | KubeCon + CloudNativeCon |
Conf Tag | |
Location | Shanghai, China |
Date | Jun 23-26, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Kubernetes not only becomes predominant in public cloud area these days, but also becomes a new trend in on-premises big data cluster environment, as an alternative of Hadoop YARN, a resource schedule component. In on-premise big data cluster, majority data are saved in HDFS. How to consume big data in HDFS with Kubernetes is a new challenge to users. In the talk we will introduce our CSI compatible HDFS plugin design and architecture first. Then, we will share our best practices and knowledge about how big data workload Spark use HDFS CSI plugin to access HDFS data when running on K8s. In the end, the TPC-DS benchmark suite will be used to analysis performance comparison between Spark on K8s with HDFS and Spark on YARN with HDFS.