January 25, 2020

242 words 2 mins read

Conda, Docker, and Kubernetes: The cloud-native future of data science (sponsored by Anaconda)

Conda, Docker, and Kubernetes: The cloud-native future of data science (sponsored by Anaconda)

The days of deploying Java code to Hadoop and Spark data lakes for data science and ML are numbered. Welcome to the future. Containers and Kubernetes make great language-agnostic distributed computing clusters: it's just as easy to deploy Python as it is Java. Mathew Lodge shows you how.

Talk Title Conda, Docker, and Kubernetes: The cloud-native future of data science (sponsored by Anaconda)
Speakers Mathew Lodge (Anaconda)
Conference Strata Data Conference
Conf Tag Make Data Work
Location New York, New York
Date September 11-13, 2018
URL Talk Page
Slides Talk Slides
Video

Big data architectures like Hadoop and Spark solve the distributed database problem well but have as an article of faith that moving compute closer to data is important for performance. They also assume your code is written in Java or another JVM-based language like Scala. The big problem? Data science, predictive analytics, and ML don’t happen in JVM-based languages. They happen in Python, R, and to a lesser extent C/C++. Secondly, today’s data center networks have 1,000 times the bandwidth at a lower total cost versus 2005, when Hadoop was first conceived, meaning that data locality doesn’t matter so much. Lastly, all the major players like AWS, Microsoft, Google, IBM, Red Hat, and Docker are lined up behind Kubernetes. Containers and Kubernetes make great language-agnostic distributed computing clusters: it’s just as easy to deploy Python as it is Java. Mathew Lodge shows you how. This session is sponsored by Anaconda.

comments powered by Disqus