November 25, 2019

220 words 2 mins read

HDFS on Kubernetes: Tech deep dive on locality and security

HDFS on Kubernetes: Tech deep dive on locality and security

There is growing interest in running Spark natively on Kubernetes, and Spark data is often stored in HDFS. Kimoon Kim and Ilan Filonenko explain how to make Spark on Kubernetes work seamlessly with HDFS by addressing challenges such as HDFS data locality and secure HDFS support.

Talk Title HDFS on Kubernetes: Tech deep dive on locality and security
Speakers Kimoon Kim (Pepperdata), Ilan Filonenko (Bloomberg LP)
Conference Strata Data Conference
Conf Tag Big Data Expo
Location San Jose, California
Date March 6-8, 2018
URL Talk Page
Slides Talk Slides
Video

There is growing interest in running Spark natively on Kubernetes, and Spark data is often stored in HDFS. Kimoon Kim and Ilan Filonenko explain how to make Spark on Kubernetes work seamlessly with HDFS by addressing challenges such as HDFS data locality and secure HDFS support. Kimoon and Ilan demonstrate how the Spark scheduler can still provide HDFS data locality on Kubernetes if HDFS is also running on Kubernetes and how they made Spark properly discover the mapping of Kubernetes containers to physical nodes to HDFS datanode daemons. You’ll also discover how Spark on Kubernetes interacts with secure HDFS using Kubernetes constructs such as Kubernetes secrets and RBAC. The secure HDFS solution can be used also when Spark on Kubernetes reaches out and accesses HDFS that runs outside Kubernetes clusters.

comments powered by Disqus