Building a real-time metrics database for trillions of points per day
As applications have increased in complexity, so have the queries needed to understand the state and performance of those systems, leading to an explosion in the volume and dimensionality of metrics. Joel Barciauskas outlines how Datadog architected its pipelines, data structures, and storage engines to answer these complex questions, all while scaling to ingest trillions of points per day.
|Talk Title||Building a real-time metrics database for trillions of points per day|
|Speakers||Joel Barciauskas (Datadog)|
|Conference||O’Reilly Software Architecture Conference|
|Conf Tag||Engineering the Future of Software|
|Location||New York, New York|
|Date||February 24-26, 2020|
Datadog—a monitoring and observability platform used by thousands of companies to understand how their systems are behaving and performing—has seen engineering organizations adopt technologies such as containers and serverless functions that shorten the lifecycle of their compute infrastructure from months down to minutes—if not seconds. These organizations want to understand their applications by querying across an increasing number of dimensions, all the way down to tracking performance on an individual customer level. This has led to an explosion of metrics. As Datadog has grown, it’s had to decide how to tier its architecture to store data in multiple formats to answer different questions related to metrics, from high-level dashboards to granular alerting queries. To determine the right price and performance trade-off for each category of query, it uses a blend of in-house, open source, and cloud services. Joel Barciauskas details how the company uses Apache Kafka, Cassandra, object stores like S3, and in-memory databases to handle its workload. Join in to discover how Datadog’s challenges of scale, performance, cost, and data accuracy have influenced the way it structures data, and the impact this has had on the company’s architecture.