November 20, 2019

317 words 2 mins read

Streaming SQL to unify batch and stream processing: Theory and practice with Apache Flink at Uber

Streaming SQL to unify batch and stream processing: Theory and practice with Apache Flink at Uber

Fabian Hueske and Shuyi Chen explore SQL's role in the world of streaming data and its implementation in Apache Flink and cover fundamental concepts, such as streaming semantics, event time, and incremental results. They also share their experience using Flink SQL in production at Uber, explaining how Uber leverages Flink SQL to solve its unique business challenges.


Talk Title	Streaming SQL to unify batch and stream processing: Theory and practice with Apache Flink at Uber
Speakers	Fabian Hueske (data Artisans), Shuyi Chen (Uber)
Conference	Strata Data Conference
Conf Tag	Big Data Expo
Location	San Jose, California
Date	March 6-8, 2018
URL	Talk Page
Slides	Talk Slides
Video

SQL is the lingua franca for querying and processing data. To this day, it provides nonprogrammers with a powerful tool for analyzing and manipulating data. But with the emergence of stream processing as a core technology for data infrastructures, can you still use SQL and bring real-time data analysis to a broader audience? The answer is yes, you can. SQL fits into the streaming world very well and forms an intuitive and powerful abstraction for streaming analytics. More importantly, you can use SQL as an abstraction to unify batch and streaming data processing. Viewing streams as dynamic tables, you can obtain consistent results from SQL evaluated over static tables and streams alike and use SQL to build materialized views as a data integration tool. Fabian Hueske and Shuyi Chen explore SQL’s role in the world of streaming data and its implementation in Apache Flink and cover fundamental concepts, such as streaming semantics, event time, and incremental results. They also share their experience using Flink SQL in production at Uber, explaining how Uber leverages Flink SQL to solve its unique business challenges and how the unified stream and batch processing platform enables both technical or nontechnical users to process real-time and batch data reliably using the same SQL at Uber scale.

flink streaming apache sql infrastructure analytics uber

comments powered by Disqus

Streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams

Streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams

November 20, 2019

Join Dean Wampler and Boris Lublinsky to learn how to build two microservice streaming applications based on Kafka using Akka Streams and Kafka Streams for data processing. You'll explore the strengths and weaknesses of each tool for particular design needs and contrast them with Spark Streaming and Flink, so you'll know when to choose them instead.

Stream processing with Kafka

Stream processing with Kafka

November 20, 2019

Tim Berglund leads a basic architectural introduction to Kafka and walks you through using Kafka Streams and KSQL to process streaming data.

Apache Kafka + Apache Mesos = Highly scalable streaming microservices

Apache Kafka + Apache Mesos = Highly scalable streaming microservices

November 18, 2019

Kai Whner shares a highly scalable, mission-critical infrastructure using Apache Kafka and Apache Mesos: Kafka brokers are used as the distributed messaging backbone; Kafkas Streams API embeds stream processing into any external application without the need for a dedicated streaming cluster; and Mesos is used as a scalable infrastructure to leverage the benefits of a cloud-native platform.

Speed up mission-critical analytics in the cloud (sponsored by Kyligence)

Speed up mission-critical analytics in the cloud (sponsored by Kyligence)

November 20, 2019

As organizations look to scale their analytics capability, the need to grow beyond a traditional data warehouse becomes critical, and cloud-based solutions allow more flexibility while being more cost efficient. Billy Liu offers an overview of Kyligence Cloud, a managed Apache Kylin online service designed to speed up mission-critical analytics at web scale for big data.

What's new in Hadoop 3.0

What's new in Hadoop 3.0

November 19, 2019

Hadoop 3.0 has been years in the making, and now it's finally arriving. Andrew Wang and Daniel Templeton offer an overview of new features, including HDFS erasure coding, YARN Timeline Service v2, YARN federation, and much more, and discuss current release management status and community testing efforts dedicated to making Hadoop 3.0 the best Hadoop major release yet.

Building stream processing as a service at Netflix

Building stream processing as a service at Netflix

November 17, 2019

Steven Wu explains how Netflixs SPaaS platform empowers users to focus on extracting insights from data streams and build stream processing applications and shares lessons learned building and operating the largest SPaaS use case: Netflixs Keystone data pipeline, a self-serve platform for creating near-real-time event pipelines that processes three trillion events and 12 PB of data every day.