December 7, 2019

201 words 1 min read

Making Big Data Processing Portable. The Story of Apache Beam and gRPC

Making Big Data Processing Portable. The Story of Apache Beam and gRPC

Big data applications have been an almost exclusive domain of Java and Scala developers. This not only frustrates engineers who prefer other languages and their ecosystems, but also impedes companies …


Talk Title	Making Big Data Processing Portable. The Story of Apache Beam and gRPC
Speakers	Ismaël Mejía (Software Engineer, Talend)
Conference	KubeCon + CloudNativeCon Europe
Conf Tag
Location	Copenhagen, Denmark
Date	Apr 30-May 4, 2018
URL	Talk Page
Slides	Talk Slides
Video

Big data applications have been an almost exclusive domain of Java and Scala developers. This not only frustrates engineers who prefer other languages and their ecosystems, but also impedes companies that already have their business logic written on other platforms from achieving the benefits of reuse when they build data-intensive applications. In this talk we introduce Apache Beam. A unified programming model designed to provide efficient and portable data processing pipelines. We will discuss in detail how Beam achieves portability by relying in two concepts: (1) Runners that translate the Beam’s model so it can be executed in existing systems like Apache Spark and Apache Flink and (2) the portability APIs, an architecture of gRPC services that coordinate the execution of pipelines in containers to accomplish language portability.

container flink api java grpc apache spark ecosystem big data programming pipeline

comments powered by Disqus

The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense

The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense

December 5, 2019

Lee Blum offers an overview of Verint's large-scale cyber-defense system built to serve its data scientists with versatile analytic operations on petabytes of data and trillions of records, covering the company's extremely challenging use case, decision considerations, major design challenges, tips and tricks, and the systems overall results.

Apache Spark programming

Apache Spark programming

November 29, 2019

Brooke Wenig walks you through the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, and Sparks streaming capabilities and machine learning APIs.

Cuttlefish: Lightweight primitives for online tuning

Cuttlefish: Lightweight primitives for online tuning

November 28, 2019

Tomer Kaftan offers an overview of Cuttlefish, a lightweight framework prototyped in Apache Spark that helps developers adaptively improve the performance of their data processing applications by inserting a few library calls into their code. These calls construct tuning primitives that use reinforcement learning to adaptively modify execution as they observe application performance over time.

Distributed deep learning with containers on heterogeneous GPU clusters

Distributed deep learning with containers on heterogeneous GPU clusters

November 26, 2019

Deep learning model performance relies on underlying data. Dong Meng offers an overview of a converged data platform that serves as a data infrastructure, providing a distributed filesystem, key-value storage and streams, and Kubernetes as orchestration layer to manage containers to train and deploy deep learning models using GPU clusters.

Smart agriculture: Blending IoT sensor data with visual analytics

Smart agriculture: Blending IoT sensor data with visual analytics

November 21, 2019

Mike Prorock offers an overview of mesur.io, a game-changing climate awareness solution that combines smart sensor technology, data transmission, and state-of-the-art visual analytics to transform the agricultural and turf management market. Mesur.io enables growers to monitor areas of concern, providing immediate benefits to crop yield, supply costs, farm labor overhead, and water consumption.

Stream processing with Kafka

Stream processing with Kafka

November 20, 2019

Tim Berglund leads a basic architectural introduction to Kafka and walks you through using Kafka Streams and KSQL to process streaming data.