December 28, 2019

205 words 1 min read

Pachyderm: Unlock the Power of Kubernetes for Big Data

Pachyderm: Unlock the Power of Kubernetes for Big Data

Pachyderm is an open source big data analytics platform completely deployed on Kubernetes. Pachyderm leverages K8s's jobs API to process massive data workloads and build streaming pipelines. Pachyd …

Talk Title Pachyderm: Unlock the Power of Kubernetes for Big Data
Speakers Joey Zwicker
Conference KubeCon + CloudNativeCon North America
Conf Tag
Location Seattle, WA, United States
Date Nov 7- 9, 2016
URL Talk Page
Slides Talk Slides
Video

Pachyderm is an open source big data analytics platform completely deployed on Kubernetes. Pachyderm leverages K8s’s jobs API to process massive data workloads and build streaming pipelines. Pachyderm’s hallmark feature is version-controlled data including viewing branches, commits and diffs for petabyte-scale data sets. In this talk we’ll demonstrate how Kubernetes and Pachyderm empowers data science teams to collaborate on a shared and unified data infrastructure. Everything is run on Kubernetes including streaming data ingestion, machine learning pipelines, to automatic service deployment using Rolling Updates. Our talk will discuss how Pachyderm couldn’t exist without a large swath of advanced Kubernetes primitives and includes demo where we stream data through the system and watch Kubernetes automatically schedule analytics containers and parallelize the data processing. This demo is inspired directly by how production users are managing data in Pachyderm today.

comments powered by Disqus