November 25, 2019

220 words 2 mins read

HDFS on Kubernetes: Tech deep dive on locality and security

HDFS on Kubernetes: Tech deep dive on locality and security

There is growing interest in running Spark natively on Kubernetes, and Spark data is often stored in HDFS. Kimoon Kim and Ilan Filonenko explain how to make Spark on Kubernetes work seamlessly with HDFS by addressing challenges such as HDFS data locality and secure HDFS support.


Talk Title	HDFS on Kubernetes: Tech deep dive on locality and security
Speakers	Kimoon Kim (Pepperdata), Ilan Filonenko (Bloomberg LP)
Conference	Strata Data Conference
Conf Tag	Big Data Expo
Location	San Jose, California
Date	March 6-8, 2018
URL	Talk Page
Slides	Talk Slides
Video

There is growing interest in running Spark natively on Kubernetes, and Spark data is often stored in HDFS. Kimoon Kim and Ilan Filonenko explain how to make Spark on Kubernetes work seamlessly with HDFS by addressing challenges such as HDFS data locality and secure HDFS support. Kimoon and Ilan demonstrate how the Spark scheduler can still provide HDFS data locality on Kubernetes if HDFS is also running on Kubernetes and how they made Spark properly discover the mapping of Kubernetes containers to physical nodes to HDFS datanode daemons. You’ll also discover how Spark on Kubernetes interacts with secure HDFS using Kubernetes constructs such as Kubernetes secrets and RBAC. The secure HDFS solution can be used also when Spark on Kubernetes reaches out and accesses HDFS that runs outside Kubernetes clusters.

container cluster security spark hdfs kubernetes

comments powered by Disqus

How to protect big data in a containerized environment

How to protect big data in a containerized environment

November 25, 2019

Recent headline-grabbing data breaches demonstrate that protecting data is essential for every enterprise. The best-of-breed approach for big data is HDFS configured with Transparent Data Encryption (TDE). However, TDE can be difficult to configure and manageissues that are only compounded when running on Docker containers. Thomas Phelan explores these challenges and how to overcome them.

Global Container Networks on Kubernetes at DigitalOcean

Global Container Networks on Kubernetes at DigitalOcean

November 25, 2019

Building a container network that is reliable, fast and easy to operate has become increasingly important in DigitalOceans distributed systems running on Kubernetes. Todays container networking tech …

Source2Image Intro

Source2Image Intro

November 25, 2019

Over the years Kubernetes has had a number of different approaches to building images on or for Kubernetes. Now is a good time to take stock of Kubernetes' image building support from the perspective …

Building a Kubernetes Scheduler using Custom Metrics

Building a Kubernetes Scheduler using Custom Metrics

November 24, 2019

The default Kubernetes scheduler does a fantastic job for typical workloads, but when you have specific requirements (like higher level application metrics) you might need other scheduling methods. …

Entitlements: Understandable Container Security Controls

Entitlements: Understandable Container Security Controls

November 23, 2019

In this talk Justin Cormack introduces a new system of security entitlements for container workloads. These specify the types of access a pod should have in a human readable way. He will also demonstr …

Secure Pods

November 21, 2019

What is a "secure pod"? What does it mean for a Kubernetes workload to have strong isolation? With the announcement of Kata Containers and the overflowing multitenancy deep-dive at the last Kubecon, i …