Observability for data pipelines: Monitoring, alerting, and tracing lineage

Data-intensive applications, with many layers of transformations and movement from different data sources, can often be challenging to maintain and iterate even after they are initially built and validated. Jiaqi Liu explores how to factor in monitoring, alerting, and tracing data lineage when building data applications that move and transform data across multiple dependencies.


Talk Title	Observability for data pipelines: Monitoring, alerting, and tracing lineage
Speakers	Jiaqi Liu (University of Chicago, CTDS)
Conference	O’Reilly Open Source Software Conference
Conf Tag	Fueling innovative software
Location	Portland, Oregon
Date	July 15-18, 2019
URL	Talk Page
Slides	Talk Slides
Video

Data-intensive applications, with many layers of transformations and movement from different data sources, can often be challenging to maintain and iterate on even after they are initially built and validated. To truly expand and develop a code base, developers must be able to test confidently during the development process and monitor the production system. Monitoring and testing data pipelines or real-time streaming processes can be very different from monitoring web services. Jiaqi Liu draws on her experience building and maintaining both batch and real-time stream data pipelines to discuss how to leverage monitoring tools like Prometheus and Grafana to define and visualize metrics, how and when to alert on common health indicators, and how to gain visibility in monitoring not just the system health but the health of the data. General concepts she touches on include observability of pipeline health, interpretability of data results, and building features into data pipelines that makes monitoring and testing just a little bit easier, such as the ability to trace data lineage and designing for immutable data.

Observability for data pipelines: Monitoring, alerting, and tracing lineage

Dont Catch Feelings, Catch Issues With Kuberhealthy

Monitoring Service Architecture and Health with BPF

Enable Serverless Metrics in Apache OpenWhisk on Kubernetes with Prometheus

Lightning Talk: An ARM Based System to Monitor Server Farms Using Grafana

Serverless for data and AI

Analytics Zoo: Distributed TensorFlow in production on Apache Spark