October 25, 2019

236 words 2 mins read

Fast data made easy with Apache Kafka and Apache Kudu (incubating)

Fast data made easy with Apache Kafka and Apache Kudu (incubating)

Ted Malaska and Jeff Holoman explain how to go from zero to full-on time series and mutable-profile systems in 40 minutes. Ted and Jeff cover code examples of ingestion from Kafka and Spark Streaming and access through SQL, Spark, and Spark SQL to explore the underlying theories and design patterns that will be common for most solutions with Kudu.

Talk Title Fast data made easy with Apache Kafka and Apache Kudu (incubating)
Speakers Ted Malaska (Capital One), Jeff Holoman (Cloudera)
Conference Strata + Hadoop World
Conf Tag Big Data Expo
Location San Jose, California
Date March 29-31, 2016
URL Talk Page
Slides Talk Slides
Video

Historically, use cases such as time series and mutable-profile datasets have been possible but difficult to achieve efficiently using traditional HDFS storage engines. These solutions might involve complex ingestion paths, deep understanding of file types, and compaction strategies. With the introduction of Kudu, many of these difficulties are eliminated. At the same time, interest in streaming solutions and low-latency analytics has surged with the growing popularity of tools like Apache Kafka. Ted Malaska and Jeff Holoman explain how to go from zero to full-on time series and mutable profile systems in 40 minutes. Ted and Jeff cover code examples of ingestion from Kafka and Spark Streaming and access through SQL, Spark, and Spark SQL to explore the underlying theories and design patterns that will be common for most solutions with Kudu.

comments powered by Disqus