January 6, 2020

243 words 2 mins read

Stream, stream, stream: Different streaming methods with Spark and Kafka

Stream, stream, stream: Different streaming methods with Spark and Kafka

NMC (Nielsen Marketing Cloud) provides customers (both marketers and publishers) with real-time analytics tools to profile their target audiences. To achieve that, the company needs to ingest billions of events per day into its big data stores in a scalable, cost-efficient way. Itai Yaffe explains how NMC continuously transforms its data infrastructure to support these goals.

Talk Title Stream, stream, stream: Different streaming methods with Spark and Kafka
Speakers Itai Yaffe (Nielsen)
Conference Strata Data Conference
Conf Tag Making Data Work
Location London, United Kingdom
Date April 30-May 2, 2019
URL Talk Page
Slides Talk Slides
Video

NMC (Nielsen Marketing Cloud) provides customers (both marketers and publishers) with real-time analytics tools to profile their target audiences. To achieve that, the company needs to ingest billions of events per day into its big data stores in a scalable, cost-efficient way. Itai Yaffe explains how NMC continuously transforms its data infrastructure to support these goals. Itai details how the company went from CSV files and standalone Java applications to multiple Kafka and Spark clusters, performing a mixture of streaming and batch ETLs, and supporting 10x data growth. Join in to hear the company’s experience as an early adopters of Spark Streaming and Spark Structured Streaming and how it overcame the technical barriers the company faced (and there were plenty). Itai concludes by sharing a rather unique solution of using Kafka to imitate streaming over NMC’s data lake while significantly reducing cloud services costs. Topics include:

comments powered by Disqus