November 26, 2019

301 words 2 mins read

Enough data engineering for a data scientist; or, How I learned to stop worrying and love the data scientists

Enough data engineering for a data scientist; or, How I learned to stop worrying and love the data scientists

Stephen O'Sullivan takes you along the data science journey, from onboarding data (using a number of data/object stores) to understanding and choosing the right data format for the data assets to using query engines (and basic query tuning). You'll learn some new skills to help you be more productive and reduce contention with the data engineering team.


Talk Title	Enough data engineering for a data scientist; or, How I learned to stop worrying and love the data scientists
Speakers	Stephen O’Sullivan (Data Whisperers)
Conference	Strata Data Conference
Conf Tag	Big Data Expo
Location	San Jose, California
Date	March 6-8, 2018
URL	Talk Page
Slides	Talk Slides
Video

How much data engineering should a data scientist know? For a data scientist to get to the fun part of their job, they normally have to do a bit of data engineering—in most cases, 50%–80% of their time is spent onboarding or wrangling data. Then it gets handed over to the data engineering team to put it into production (via dev, test, and QA). However, in most cases, the data engineering team will have to do some modifications, rewrites, head shaking, and hand wringing to make the code production ready and meet the SLAs defined by the business, as there is a disconnect in how data scientists and data engineers develop code and models. Stephen O’Sullivan takes you along the data science journey, from onboarding data (using a number of data/object stores) to understanding and choosing the right data format for the data assets to using query engines (and basic query tuning). You’ll learn how a distributed streaming platform works and how to take advantage of it and explore good coding practices. Along the way, you’ll learn some new skills to help you be more productive and reduce contention with the data engineering team.

code data engineering streaming data science

comments powered by Disqus

Working with the data of sports

Working with the data of sports

November 18, 2019

Sports analytics today is more than a matter of analyzing box scores and play-by-play statistics. Faced with detailed on-field or on-court data from every game, sports teams face challenges in data management, data engineering, and analytics. Thomas Miller details the challenges faced by a Major League Baseball team as it sought competitive advantage through data science and deep learning.

Playing well together: Big data beyond the JVM with Spark and friends

Playing well together: Big data beyond the JVM with Spark and friends

November 22, 2019

Holden Karau and Rachel Warren explore the state of the current big data ecosystem and explain how to best work with it in non-JVM languages. While much of the focus will be on Python + Spark, the talk will also include interesting anecdotes about how these lessons apply to other systems (including Kafka).

Streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams

Streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams

November 20, 2019

Join Dean Wampler and Boris Lublinsky to learn how to build two microservice streaming applications based on Kafka using Akka Streams and Kafka Streams for data processing. You'll explore the strengths and weaknesses of each tool for particular design needs and contrast them with Spark Streaming and Flink, so you'll know when to choose them instead.

Differentiating via data science

Differentiating via data science

November 26, 2019

While companies often use data science as a supportive function, the emergence of new business models has made it possible for some companies to differentiate via data science. Eric Colson explores what it means to differentiate by data science and explains why companies must now think very differently about the role and placement of data science in the organization.

Explorer Graph Algothrims for Data Science with Neo4j

Explorer Graph Algothrims for Data Science with Neo4j

November 26, 2019

This talk provides a very quick overview of Graph theory and the graph Algothrims available in Neo4j.

From the presidential campaign trail to the enterprise: Building effective data-driven teams

From the presidential campaign trail to the enterprise: Building effective data-driven teams

November 26, 2019

The 2012 Obama campaign ran the first personalized presidential campaign in history. The data team was made up of people from diverse backgrounds who embraced data science in service of the goal. Civis Analytics emerged from this team and today enables organizations to use the same methods outside politics. Katie Malone shares lessons learned from these experiences for building effective teams.