January 25, 2020

327 words 2 mins read

Circuit breakers to safeguard for garbage in, garbage out

Circuit breakers to safeguard for garbage in, garbage out

Do your analysts always trust the insights generated by your data platform? Ensuring insights are always reliable is critical for use cases in the financial sector. Sandeep Uttamchandani outlines a circuit breaker pattern developed for data pipelines, similar to the common design pattern used in service architectures, that detects and corrects problems and ensures always reliable insights.

Talk Title Circuit breakers to safeguard for garbage in, garbage out
Speakers Sandeep Uttamchandani (Intuit)
Conference Strata Data Conference
Conf Tag Make Data Work
Location New York, New York
Date September 11-13, 2018
URL Talk Page
Slides Talk Slides
Video

Do your analysts always trust the insights generated by your data platform? Faced with an unexpected insight, does your analyst team spend time verifying data quality, ETL correctness, and job dependencies? As financial use cases increasingly combine social feeds, these verifications are extremely complex and nonscalable given the volume, velocity, and variety. Circuit breaker is a common design pattern used by software developers to ensure graceful handling of errors in a service-oriented architecture. Taking inspiration from this pattern, Sandeep Uttamchandani outlines a circuit breaker pattern developed for data pipelines that detects and corrects problems and ensures always reliable insights. The process of converting data into insights involves a multistage pipeline with ingestion, cleansing, transformations, and analytical operations. Each stage implements a circuit breaker that continuously analyzes metrics and correctness rules. If any of these are violated, the circuit is broken, and processing does not progress to the next stage in the pipeline. The checks are a collection of runtime analysis for data quality, job health, and operational error logs from the analytical engines and data stores. The checks are implemented as a combination of domain-knowledge rules and machine learning for anomaly detection. Depending on the type of error, the circuit breaker framework attempts to either repair and reschedule the jobs or cancels the job with a user notification. Sandeep explains how this pattern was developed and how it is applied.

comments powered by Disqus