January 17, 2020

325 words 2 mins read

Tracking data lineage at Stitch Fix

Tracking data lineage at Stitch Fix

Neelesh Srinivas Salian explains how Stitch Fix built a service to better understand the movement and evolution of data within the company's data warehouse, from the initial ingestion from outside sources through all of its ETLs. Neelesh covers why and how Stitch Fix built the service and details some use cases.

Talk Title Tracking data lineage at Stitch Fix
Speakers Neelesh Salian (Stitch Fix)
Conference Strata Data Conference
Conf Tag Make Data Work
Location New York, New York
Date September 11-13, 2018
URL Talk Page
Slides Talk Slides
Video

Personalization allows Stitch Fix to style its clients and provide recommendations to help them find what they love. To do this, the company gathers information about a client’s preferences up front when they sign up from the service and learns more about them as they become longer-term customers. This information is important for making recommendations but also must be protected and managed with care. The data science team at Stitch Fix is the primary owner of the recommendation systems. Backing them up is the data platform team, who maintain the data infrastructure, data warehouse, and supporting tools and services. This data warehouse has several different data sources that read and write into it. This includes a logging pipeline for events, every Spark-based ETL, and daily snapshots of structured data from Stitch Fix applications. Neelesh Srinivas Salian explains Stitch Fix’s process to better understand the movement and evolution of data within its data warehouse, from the initial ingestion from outside sources through all of its ETLs. Neelesh also details how Stitch Fix built a service that helps the company understand the lineage information that is associated with each table in the data warehouse. This service helps the company understand the source, parentage, and journey of all data in the warehouse. Although Stitch Fix makes sure to anonymize and filter out sensitive information from this data, the company needs a more flexible long-term solution as the business expands.

comments powered by Disqus