Scaling a user delivery network for real-time audience targeting

Adam Shepard peels back the covers on a user delivery networka worldwide distributed data store powering over 80 billion transactions a day at millisecond speed. Join in to learn about eventually consistent data architectures, tiered and hybrid storage layers, and what it takes to manage that much data at scale.


Talk Title	Scaling a user delivery network for real-time audience targeting
Speakers	Adam Shepard (AudienceScience)
Conference	O’Reilly Velocity Conference
Conf Tag	Build Resilient Distributed Systems
Location	San Jose, California
Date	June 20-22, 2017
URL	Talk Page
Slides	Talk Slides
Video

We’ve all seen those online ads that seem to follow you around the web as soon as you visit one site or check out one product, and we know that a combination of tracking technologies accumulate that data to auction your eyeballs off to the highest bidder. There are dozens of tools and technologies out there with blazing-fast performance that can serve data to processing systems in milliseconds or less, and plenty of blog posts and marketing material to support those claims. But that’s really just the tip of iceberg. How is that data generated? How is it managed, updated, and synchronized to provide that pinpoint targeting at internet scale? Adam Shepard peels back the covers on a user delivery network—a worldwide distributed data store powering over 80 billion transactions a day at millisecond speed. Keeping large amounts of user data in sync across multiple data centers, pulling needles of user behavior out of a haystack, and turning it all into actionable or buyable behaviors in real time presents a massive infrastructure and software challenge. Adam explores AudienceScience’s journey and evolution through several iterations of processing that data at scale, including its most recent architecture, and shares lessons learned along the way. Adam surveys some of the original infrastructures and technologies that powered AudienceScience’s user delivery network, diving into scaling and managing MySQL, Voldemort, and Cassandra and discussing the performance characteristics of those different technologies and their trade-offs, as he relates some difficult lessons on procuring, provisioning, and managing hybrid infrastructures. Adam concludes by offering an overview of the current architecture of AudienceScience’s user delivery network—backed by a large-scale stream processing infrastructure with Storm, Kafka, and Spark, with data served by a purpose-built high-speed, asynchronous read-behind cache—as well as the enhancements made to the latest architecture as it’s been deployed and battle-tested at scale. Topics include:

Scaling a user delivery network for real-time audience targeting

Building an Edge Computing Platform for Network Services Using Cloud Native Technology [I]

Building a Secure, Multi-Protocol and Multi-Tenant Cluster for Internet-Facing Services [A]

If you Dont Monitor your Infrastructure, you Dont Own it! Regain Control Thanks to Prometheus [I]

Big data for operational insights

From rivulets to rivers: Elastic stream processing in Heron

Machines and the magic of fast learning (sponsored by MemSQL)