Scaling and Securing Spark on Kubernetes at Bloomberg
In the management of its Data Science Platform, Bloomberg has always focused on providing tenants with secure, reliable, and scalable solutions for their machine learning workflows and ETL pipelines. …
Talk Title | Scaling and Securing Spark on Kubernetes at Bloomberg |
Speakers | Ilan Filonenko (Software Engineer, Bloomberg) |
Conference | KubeCon + CloudNativeCon Europe |
Conf Tag | |
Location | Barcelona, Spain |
Date | May 19-23, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
In the management of its Data Science Platform, Bloomberg has always focused on providing tenants with secure, reliable, and scalable solutions for their machine learning workflows and ETL pipelines. In adapting Kubernetes to support a diverse set of machine learning workloads, we decided to also support Apache Spark with Native Kubernetes integration. In this talk we’ll discuss how we designed: a scalable and resilient External Shuffle Service for Dynamic Resource Allocation, a pluggable interface for secure worker creation, and a token renewal service that handles privacy and security across Spark jobs. These topics will address multi-tenancy, data security and privacy, and elastic resource scalability in the context of running Spark natively on Kubernetes, with an emphasis on disaggregated compute.