Stream, stream, stream: Different streaming methods with Spark and Kafka
NMC (Nielsen Marketing Cloud) provides customers (both marketers and publishers) with real-time analytics tools to profile their target audiences. To achieve that, the company needs to ingest billions of events per day into its big data stores in a scalable, cost-efficient way. Itai Yaffe explains how NMC continuously transforms its data infrastructure to support these goals.
Talk Title | Stream, stream, stream: Different streaming methods with Spark and Kafka |
Speakers | Itai Yaffe (Nielsen) |
Conference | Strata Data Conference |
Conf Tag | Making Data Work |
Location | London, United Kingdom |
Date | April 30-May 2, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
NMC (Nielsen Marketing Cloud) provides customers (both marketers and publishers) with real-time analytics tools to profile their target audiences. To achieve that, the company needs to ingest billions of events per day into its big data stores in a scalable, cost-efficient way. Itai Yaffe explains how NMC continuously transforms its data infrastructure to support these goals. Itai details how the company went from CSV files and standalone Java applications to multiple Kafka and Spark clusters, performing a mixture of streaming and batch ETLs, and supporting 10x data growth. Join in to hear the company’s experience as an early adopters of Spark Streaming and Spark Structured Streaming and how it overcame the technical barriers the company faced (and there were plenty). Itai concludes by sharing a rather unique solution of using Kafka to imitate streaming over NMC’s data lake while significantly reducing cloud services costs. Topics include: