January 16, 2020

295 words 2 mins read

Why and how to leverage the power and simplicity of SQL on Apache Flink

Why and how to leverage the power and simplicity of SQL on Apache Flink

Fabian Hueske discusses why SQL is a great approach to unify batch and stream processing. He gives an update on Apache Flink's SQL support and shares some interesting use cases from large-scale production deployments. Finally, Fabian presents Flink's new query service that enables users and applications to submit streaming and batch SQL queries and retrieve low-latency updated results.


Talk Title	Why and how to leverage the power and simplicity of SQL on Apache Flink
Speakers	Fabian Hueske (Ververica)
Conference	Strata Data Conference
Conf Tag	Make Data Work
Location	New York, New York
Date	September 11-13, 2018
URL	Talk Page
Slides	Talk Slides
Video

Everybody working with data knows SQL. Apache Flink provides SQL support for querying and processing batch and streaming data. Flink’s SQL support powers large-scale production systems at Alibaba, Huawei, and Uber. Based on Flink SQL, these companies have built systems for their internal users as well as publicly offered services for paying customers. Fabian Hueske discusses why and how to leverage the simplicity and power of SQL on Flink. Fabian starts by exploring the use cases that Flink SQL was designed for and presents some real-world problems that it can solve. In particular, he explains why unified batch and stream processing is important and what it means to run SQL queries on streams of data. Fabian then demonstrates how to leverage Flink’s full potential. Since the end of last year, the Flink community has been working on a service that integrates a query interface, (external) table catalogs, and result serving functionality for static, appending, and updating result sets. Fabian explores the design and features of this query service and details how it enables exploratory batch and streaming queries, ETL pipelines, and live updating query results that serve applications, such as real-time dashboards.

flink huawei alibaba streaming apache sql large-scale etl dashboard use case pipeline

comments powered by Disqus

The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense

The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense

December 5, 2019

Lee Blum offers an overview of Verint's large-scale cyber-defense system built to serve its data scientists with versatile analytic operations on petabytes of data and trillions of records, covering the company's extremely challenging use case, decision considerations, major design challenges, tips and tricks, and the systems overall results.

Apache Spark programming

Apache Spark programming

November 29, 2019

Brooke Wenig walks you through the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, and Sparks streaming capabilities and machine learning APIs.

Cuttlefish: Lightweight primitives for online tuning

Cuttlefish: Lightweight primitives for online tuning

November 28, 2019

Tomer Kaftan offers an overview of Cuttlefish, a lightweight framework prototyped in Apache Spark that helps developers adaptively improve the performance of their data processing applications by inserting a few library calls into their code. These calls construct tuning primitives that use reinforcement learning to adaptively modify execution as they observe application performance over time.

Moving the needle of the pin: Streaming hundreds of terabytes of pins from MySQL to S3/Hadoop continuously

Moving the needle of the pin: Streaming hundreds of terabytes of pins from MySQL to S3/Hadoop continuously

November 22, 2019

With the rise of large-scale real-time computation, there is a growing need to link legacy MySQL systems with real-time platforms. Henry Cai and Yi Yin offer an overview of WaterMill, Pinterest's continuous DB ingestion system for streaming SQL data into near-real-time computation pipelines to support dynamic personalized recommendations and search indices.

Stream processing with Kafka

Stream processing with Kafka

November 20, 2019

Tim Berglund leads a basic architectural introduction to Kafka and walks you through using Kafka Streams and KSQL to process streaming data.

Streaming SQL to unify batch and stream processing: Theory and practice with Apache Flink at Uber

Streaming SQL to unify batch and stream processing: Theory and practice with Apache Flink at Uber

November 20, 2019

Fabian Hueske and Shuyi Chen explore SQL's role in the world of streaming data and its implementation in Apache Flink and cover fundamental concepts, such as streaming semantics, event time, and incremental results. They also share their experience using Flink SQL in production at Uber, explaining how Uber leverages Flink SQL to solve its unique business challenges.