December 5, 2019

266 words 2 mins read

Architecting a next-generation data platform

Architecting a next-generation data platform

Using Entity 360 as an example, Jonathan Seidman, Ted Malaska, and Mark Grover explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Impala, Kudu, Spark Streaming, and Spark SQL with Hadoop to enable new forms of data processing and analytics.

Talk Title Architecting a next-generation data platform
Speakers Jonathan Seidman (Cloudera), Mark Grover (Lyft), Ted Malaska (Capital One)
Conference Strata Data Conference
Conf Tag Making Data Work
Location London, United Kingdom
Date May 23-25, 2017
URL Talk Page
Slides Talk Slides
Video

Apache Hadoop is rapidly moving from its batch processing roots to a more flexible platform supporting both batch and streaming workloads. Rapid advancements in the Hadoop ecosystem are causing a dramatic evolution in both the storage and processing capabilities of the Hadoop platform. These advancements include projects like: While these advancements to the Hadoop platform are exciting, they also add a new array of tools that architects and developers need to understand when architecting solutions with Hadoop. Using Entity 360 as an example, Jonathan Seidman, Ted Malaska, and Mark Grover explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Impala, Kudu, Spark Streaming, and Spark SQL with Hadoop to enable new forms of data processing and analytics. Along the way, they discuss considerations and best practices for utilizing these components to implement solutions, cover common challenges and how to address them, and provide practical advice for building your own modern, real-time big data architectures. Topics include:

comments powered by Disqus