February 10, 2020

286 words 2 mins read

Spark on Kubernetes for data science

Spark on Kubernetes for data science

Spark on Kubernetes is a winning combination for data science that stitches together a flexible platform harnessing the best of both worlds. Jordan Volz gives a brief overview of Spark and Kubernetes, the Spark on Kubernetes project, why its an ideal fit for data scientists who may have been dissatisfied with other iterations of Spark in the past, and some applications.

Talk Title Spark on Kubernetes for data science
Speakers Jordan Volz (Dataiku)
Conference Strata Data Conference
Conf Tag Make Data Work
Location New York, New York
Date September 24-26, 2019
URL Talk Page
Slides Talk Slides
Video

Data science has benefitted greatly from advances in big data and containerization technologies. Spark is the leading platform for data engineering and data science at scale. Kubernetes is the leading container orchestration service. Spark on Kubernetes is a winning combination for data science that stitches together a flexible platform harnessing the best of both worlds. Although still very experimental and young, Spark on Kubernetes shows tremendous promise and should be something all data science organizations are aware of. Jordan Volz gives a brief overview of Spark and Kubernetes, explaining the history of each and why they are so crucial to the modern data scientist. He explores the Spark on Kubernetes project and why it’s an ideal fit for data scientists who may have been dissatisfied with other iterations of Spark in the past. He also dives into Spark on Kubernetes as the go-to platform in cloud native architectures as organizations begin to modernize their older on-premises architectures and ready them for cloud deployments. He shows some concrete examples to whet your appetite and get you excited to go home and start experimenting with Spark on Kubernetes for yourself.

comments powered by Disqus