Pachyderm: Unlock the Power of Kubernetes for Big Data
Pachyderm is an open source big data analytics platform completely deployed on Kubernetes. Pachyderm leverages K8s's jobs API to process massive data workloads and build streaming pipelines. Pachyd …
Talk Title | Pachyderm: Unlock the Power of Kubernetes for Big Data |
Speakers | Joey Zwicker |
Conference | KubeCon + CloudNativeCon North America |
Conf Tag | |
Location | Seattle, WA, United States |
Date | Nov 7- 9, 2016 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Pachyderm is an open source big data analytics platform completely deployed on Kubernetes. Pachyderm leverages K8s’s jobs API to process massive data workloads and build streaming pipelines. Pachyderm’s hallmark feature is version-controlled data including viewing branches, commits and diffs for petabyte-scale data sets. In this talk we’ll demonstrate how Kubernetes and Pachyderm empowers data science teams to collaborate on a shared and unified data infrastructure. Everything is run on Kubernetes including streaming data ingestion, machine learning pipelines, to automatic service deployment using Rolling Updates. Our talk will discuss how Pachyderm couldn’t exist without a large swath of advanced Kubernetes primitives and includes demo where we stream data through the system and watch Kubernetes automatically schedule analytics containers and parallelize the data processing. This demo is inspired directly by how production users are managing data in Pachyderm today.