Architecting a next-generation data platform
Using Customer 360 and the internet of things as examples, Jonathan Seidman and Ted Malaska explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, including components like Kafka, Flink, Kudu, Spark Streaming, and Spark SQL and modern storage engines to enable new forms of data processing and analytics.
Talk Title | Architecting a next-generation data platform |
Speakers | Ted Malaska (Capital One), Jonathan Seidman (Cloudera) |
Conference | Strata Data Conference |
Conf Tag | Make Data Work |
Location | New York, New York |
Date | September 11-13, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Rapid advancements are causing a dramatic evolution in both the storage and processing capabilities in the open source enterprise data software ecosystem. These advancements include projects like: These storage and processing systems provide a powerful platform to implement data processing applications on batch and streaming data. While these advancements are exciting, they also add a new array of tools that architects and developers need to understand when architecting modern data processing solutions. Using Customer 360 and the internet of things as examples, Jonathan Seidman and Ted Malaska explain how to architect a modern, real-time big data platform leveraging these components to reliably integrate multiple data sources, perform real-time and batch data processing, reliably store massive volumes of data, and efficiently query and process large datasets. Along the way, they discuss considerations and best practices for utilizing these components to implement solutions, cover common challenges and how to address them, and provide practical advice for building your own modern, real-time data architectures. Topics include: