December 22, 2019

188 words 1 min read

Running Apache Samza on Kubernetes

Running Apache Samza on Kubernetes

Apache Samza is a distributed stream processing framework that allows you to process and analyze your data in real-time. It has been widely used at Linkedin and other companies on a large scale. Recen …

Talk Title Running Apache Samza on Kubernetes
Speakers Weiqing Yang (Software Engineer, LinkedIn)
Conference KubeCon + CloudNativeCon North America
Conf Tag
Location San Diego, CA, USA
Date Nov 15-21, 2019
URL Talk Page
Slides Talk Slides
Video

Apache Samza is a distributed stream processing framework that allows you to process and analyze your data in real-time. It has been widely used at Linkedin and other companies on a large scale. Recently, we added Kubernetes as the new scheduler backend for Samza to run in distributed mode. In this talk, we will deep dive into the technical details about how Samza runs natively on Kubernetes by leveraging the primitives provided by Kubernetes for scheduling, storages, etc. We will also compare running Samza on Kubernetes with other existing solutions such as YARN and standalone mode. Finally, we will share some practices about running Kubernetes as a container orchestration framework for other big data processing engines.

comments powered by Disqus