October 4, 2019

204 words 1 min read

HDFS CSI Plugin: Speed Up Kubernetes in On-Premises Big Data Cluster

HDFS CSI Plugin: Speed Up Kubernetes in On-Premises Big Data Cluster

Kubernetes not only becomes predominant in public cloud area these days, but also becomes a new trend in on-premises big data cluster environment, as an alternative of Hadoop YARN, a resource schedule …

Talk Title HDFS CSI Plugin: Speed Up Kubernetes in On-Premises Big Data Cluster
Speakers Junping Du (Architect, Tencent), Yi Chen (Senior Software Engineer, Tencent)
Conference KubeCon + CloudNativeCon
Conf Tag
Location Shanghai, China
Date Jun 23-26, 2019
URL Talk Page
Slides Talk Slides
Video

Kubernetes not only becomes predominant in public cloud area these days, but also becomes a new trend in on-premises big data cluster environment, as an alternative of Hadoop YARN, a resource schedule component. In on-premise big data cluster, majority data are saved in HDFS. How to consume big data in HDFS with Kubernetes is a new challenge to users. In the talk we will introduce our CSI compatible HDFS plugin design and architecture first. Then, we will share our best practices and knowledge about how big data workload Spark use HDFS CSI plugin to access HDFS data when running on K8s. In the end, the TPC-DS benchmark suite will be used to analysis performance comparison between Spark on K8s with HDFS and Spark on YARN with HDFS.

comments powered by Disqus