Conda, Docker, and Kubernetes: The cloud-native future of data science (sponsored by Anaconda)
The days of deploying Java code to Hadoop and Spark data lakes for data science and ML are numbered. Welcome to the future. Containers and Kubernetes make great language-agnostic distributed computing clusters: it's just as easy to deploy Python as it is Java. Mathew Lodge shows you how.
Talk Title | Conda, Docker, and Kubernetes: The cloud-native future of data science (sponsored by Anaconda) |
Speakers | Mathew Lodge (Anaconda) |
Conference | Strata Data Conference |
Conf Tag | Make Data Work |
Location | New York, New York |
Date | September 11-13, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Big data architectures like Hadoop and Spark solve the distributed database problem well but have as an article of faith that moving compute closer to data is important for performance. They also assume your code is written in Java or another JVM-based language like Scala. The big problem? Data science, predictive analytics, and ML don’t happen in JVM-based languages. They happen in Python, R, and to a lesser extent C/C++. Secondly, today’s data center networks have 1,000 times the bandwidth at a lower total cost versus 2005, when Hadoop was first conceived, meaning that data locality doesn’t matter so much. Lastly, all the major players like AWS, Microsoft, Google, IBM, Red Hat, and Docker are lined up behind Kubernetes. Containers and Kubernetes make great language-agnostic distributed computing clusters: it’s just as easy to deploy Python as it is Java. Mathew Lodge shows you how. This session is sponsored by Anaconda.