Embracing Big Data Workload in Cloud-Native Environment with Data Locality
Kubernetes support schedule workloads based on CPU and memory resource with node affinity, pod affinity and anti-affinity. This works very well for stateless workloads. For stateful workloads, especia …
Talk Title | Embracing Big Data Workload in Cloud-Native Environment with Data Locality |
Speakers | Sammi Chen (Software Engineer, Tencent), Xiaoyu Yao (Principal Software Engineer, Cloudera) |
Conference | KubeCon + CloudNativeCon |
Conf Tag | |
Location | Shanghai, China |
Date | Jun 23-26, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Kubernetes support schedule workloads based on CPU and memory resource with node affinity, pod affinity and anti-affinity. This works very well for stateless workloads. For stateful workloads, especially big data workloads, scheduling compute close to data source can greatly boost performance, reliability and availability. However, in many cloud based storage systems, the data locality info is either unavailable or not exposed to container orchestra. In this talk, we will first compare the data locality support from mainstream container attached storage for Kubernetes. Then we will introduce network topology support from Apache Hadoop Ozone and how to use it as locality aware container attached storage via Ozone CSI plugin for better workloads scheduling. Last, we will use Spark on K8s to demo the benefits of data locality aware scheduling with Apache Hadoop Ozone.