December 12, 2019

255 words 2 mins read

Architecting a next-generation data platform

Architecting a next-generation data platform

Using Customer 360 and the IoT as examples, Jonathan Seidman and Ted Malaska explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Flink, Kudu, Spark Streaming, and Spark SQL and modern storage engines to enable new forms of data processing and analytics.


Talk Title	Architecting a next-generation data platform
Speakers	Ted Malaska (Capital One), Jonathan Seidman (Cloudera)
Conference	Strata Data Conference
Conf Tag	Making Data Work
Location	London, United Kingdom
Date	May 22-24, 2018
URL	Talk Page
Slides	Talk Slides
Video

Rapid advancements are causing a dramatic evolution in both the storage and processing capabilities in the open source big data software ecosystem. These advancements include projects like: These storage and processing systems provide a powerful platform to implement data processing applications on batch and streaming data. While these advancements are exciting, they also add a new array of tools that architects and developers need to understand when architecting modern data processing solutions. Using Customer 360 and the IoT as examples, Jonathan Seidman and Ted Malaska explain how to architect a modern, real-time big data platform leveraging components to reliably integrate multiple data sources, perform real-time and batch data processing, reliably store massive volumes of data, and efficiently query and process large datasets. Along the way, they discuss considerations and best practices for utilizing these components to implement solutions, cover common challenges and how to address them, and provide practical advice for building your own modern, real-time data architectures. Topics include:

streaming dataset ecosystem open source big data iot

comments powered by Disqus

StreamDM: Advanced data science with Spark Streaming

StreamDM: Advanced data science with Spark Streaming

December 5, 2019

Heitor Murilo Gomes and Albert Bifet offer an overview of StreamDM, a real-time analytics open source software library built on top of Spark Streaming, developed at Huawei's Noahs Ark Lab and Tlcom ParisTech.

Smart agriculture: Blending IoT sensor data with visual analytics

Smart agriculture: Blending IoT sensor data with visual analytics

November 21, 2019

Mike Prorock offers an overview of mesur.io, a game-changing climate awareness solution that combines smart sensor technology, data transmission, and state-of-the-art visual analytics to transform the agricultural and turf management market. Mesur.io enables growers to monitor areas of concern, providing immediate benefits to crop yield, supply costs, farm labor overhead, and water consumption.

Architecting data platforms for cybersecurity

Architecting data platforms for cybersecurity

December 12, 2019

Data is becoming a crucial weapon to secure an organization against cyber threats. Charaka Goonatilake shares strategies for designing effective data platforms for cybersecurity using big data technologies, such as Spark and Hadoop, and explains how these platforms are being used in real-world examples of data-driven security.

Hadoop under attack: Securing data in a banking domain

Hadoop under attack: Securing data in a banking domain

December 9, 2019

The apparent difficulty of managing Hadoop compared to more traditional and proprietary data products makes some companies wary of the Hadoop ecosystem, but managing security is becoming more accessible in the Hadoop space, particularly in the Cloudera stack. Federico Leven offers an overview of an end-to-end security deployment on Hadoop and the data and security governance policies implemented.

You call it data lake; we call it Data Historian.

You call it data lake; we call it Data Historian.

December 4, 2019

There are a number of tools that make it easy to implement a data lake. However, most lack the essential features that prevent your data lake from turning into a data swamp. Naghman Waheed and Brian Arnold offer an overview of Monsanto's Data Historian platform, which can ingest, store, and access datasets without compromising ease of use, governance, or security.

Youre doing it wrong: How Zoomdata rearchitected streaming

Youre doing it wrong: How Zoomdata rearchitected streaming

December 4, 2019

The value of real-time streaming analytics with historical data is immense. Big data application Zoomdata updates historical dashboards in real time without complex reaggregations, but streaming in the age of the IoT requires handling of data in volumes not seen in traditional feeds. Erin Recachinas explains how Zoomdata moved to a scalable microservice architecture for streaming sources.