Scaling a user delivery network for real-time audience targeting
Adam Shepard peels back the covers on a user delivery networka worldwide distributed data store powering over 80 billion transactions a day at millisecond speed. Join in to learn about eventually consistent data architectures, tiered and hybrid storage layers, and what it takes to manage that much data at scale.
Talk Title | Scaling a user delivery network for real-time audience targeting |
Speakers | Adam Shepard (AudienceScience) |
Conference | O’Reilly Velocity Conference |
Conf Tag | Build Resilient Distributed Systems |
Location | San Jose, California |
Date | June 20-22, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
We’ve all seen those online ads that seem to follow you around the web as soon as you visit one site or check out one product, and we know that a combination of tracking technologies accumulate that data to auction your eyeballs off to the highest bidder. There are dozens of tools and technologies out there with blazing-fast performance that can serve data to processing systems in milliseconds or less, and plenty of blog posts and marketing material to support those claims. But that’s really just the tip of iceberg. How is that data generated? How is it managed, updated, and synchronized to provide that pinpoint targeting at internet scale? Adam Shepard peels back the covers on a user delivery network—a worldwide distributed data store powering over 80 billion transactions a day at millisecond speed. Keeping large amounts of user data in sync across multiple data centers, pulling needles of user behavior out of a haystack, and turning it all into actionable or buyable behaviors in real time presents a massive infrastructure and software challenge. Adam explores AudienceScience’s journey and evolution through several iterations of processing that data at scale, including its most recent architecture, and shares lessons learned along the way. Adam surveys some of the original infrastructures and technologies that powered AudienceScience’s user delivery network, diving into scaling and managing MySQL, Voldemort, and Cassandra and discussing the performance characteristics of those different technologies and their trade-offs, as he relates some difficult lessons on procuring, provisioning, and managing hybrid infrastructures. Adam concludes by offering an overview of the current architecture of AudienceScience’s user delivery network—backed by a large-scale stream processing infrastructure with Storm, Kafka, and Spark, with data served by a purpose-built high-speed, asynchronous read-behind cache—as well as the enhancements made to the latest architecture as it’s been deployed and battle-tested at scale. Topics include: