November 26, 2019

301 words 2 mins read

Enough data engineering for a data scientist; or, How I learned to stop worrying and love the data scientists

Enough data engineering for a data scientist; or, How I learned to stop worrying and love the data scientists

Stephen O'Sullivan takes you along the data science journey, from onboarding data (using a number of data/object stores) to understanding and choosing the right data format for the data assets to using query engines (and basic query tuning). You'll learn some new skills to help you be more productive and reduce contention with the data engineering team.

Talk Title Enough data engineering for a data scientist; or, How I learned to stop worrying and love the data scientists
Speakers Stephen O’Sullivan (Data Whisperers)
Conference Strata Data Conference
Conf Tag Big Data Expo
Location San Jose, California
Date March 6-8, 2018
URL Talk Page
Slides Talk Slides
Video

How much data engineering should a data scientist know? For a data scientist to get to the fun part of their job, they normally have to do a bit of data engineering—in most cases, 50%–80% of their time is spent onboarding or wrangling data. Then it gets handed over to the data engineering team to put it into production (via dev, test, and QA). However, in most cases, the data engineering team will have to do some modifications, rewrites, head shaking, and hand wringing to make the code production ready and meet the SLAs defined by the business, as there is a disconnect in how data scientists and data engineers develop code and models. Stephen O’Sullivan takes you along the data science journey, from onboarding data (using a number of data/object stores) to understanding and choosing the right data format for the data assets to using query engines (and basic query tuning). You’ll learn how a distributed streaming platform works and how to take advantage of it and explore good coding practices. Along the way, you’ll learn some new skills to help you be more productive and reduce contention with the data engineering team.

comments powered by Disqus