Making Big Data Processing Portable. The Story of Apache Beam and gRPC
Big data applications have been an almost exclusive domain of Java and Scala developers. This not only frustrates engineers who prefer other languages and their ecosystems, but also impedes companies …
Talk Title | Making Big Data Processing Portable. The Story of Apache Beam and gRPC |
Speakers | Ismaël Mejía (Software Engineer, Talend) |
Conference | KubeCon + CloudNativeCon Europe |
Conf Tag | |
Location | Copenhagen, Denmark |
Date | Apr 30-May 4, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Big data applications have been an almost exclusive domain of Java and Scala developers. This not only frustrates engineers who prefer other languages and their ecosystems, but also impedes companies that already have their business logic written on other platforms from achieving the benefits of reuse when they build data-intensive applications. In this talk we introduce Apache Beam. A unified programming model designed to provide efficient and portable data processing pipelines. We will discuss in detail how Beam achieves portability by relying in two concepts: (1) Runners that translate the Beam’s model so it can be executed in existing systems like Apache Spark and Apache Flink and (2) the portability APIs, an architecture of gRPC services that coordinate the execution of pipelines in containers to accomplish language portability.