Spark on Kubernetes for data science
Spark on Kubernetes is a winning combination for data science that stitches together a flexible platform harnessing the best of both worlds. Jordan Volz gives a brief overview of Spark and Kubernetes, the Spark on Kubernetes project, why its an ideal fit for data scientists who may have been dissatisfied with other iterations of Spark in the past, and some applications.
Talk Title | Spark on Kubernetes for data science |
Speakers | Jordan Volz (Dataiku) |
Conference | Strata Data Conference |
Conf Tag | Make Data Work |
Location | New York, New York |
Date | September 24-26, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Data science has benefitted greatly from advances in big data and containerization technologies. Spark is the leading platform for data engineering and data science at scale. Kubernetes is the leading container orchestration service. Spark on Kubernetes is a winning combination for data science that stitches together a flexible platform harnessing the best of both worlds. Although still very experimental and young, Spark on Kubernetes shows tremendous promise and should be something all data science organizations are aware of. Jordan Volz gives a brief overview of Spark and Kubernetes, explaining the history of each and why they are so crucial to the modern data scientist. He explores the Spark on Kubernetes project and why it’s an ideal fit for data scientists who may have been dissatisfied with other iterations of Spark in the past. He also dives into Spark on Kubernetes as the go-to platform in cloud native architectures as organizations begin to modernize their older on-premises architectures and ready them for cloud deployments. He shows some concrete examples to whet your appetite and get you excited to go home and start experimenting with Spark on Kubernetes for yourself.