October 21, 2019

165 words 1 min read

Scalable schema management for Hadoop and Spark applications

Scalable schema management for Hadoop and Spark applications

Schema plays a key role in the Hadoop architecture at Uber. Kelvin Chu and Evan Richards explain why schema is important and how it can make your Hadoop and Spark application more reliable and efficient.

Talk Title Scalable schema management for Hadoop and Spark applications
Speakers Kelvin Chu (Uber), Evan Richards (Uber)
Conference Strata + Hadoop World
Conf Tag Big Data Expo
Location San Jose, California
Date March 29-31, 2016
URL Talk Page
Slides Talk Slides
Video

Schema plays a key role in the Hadoop architecture at Uber. Uber has a complex environment of many data sources (key-value stores, Kafka, relational DBs) and many data producer/consumer combinations. Kelvin Chu and Evan Richards discuss Uber’s internal systems and tools for schema creation, inference, validation, evolution, and migration, covering motivations and results. Kelvin and Evan share their experience implementing and optimizing Uber’s data producer clients in four languages—Python, Node.js, Java, and Go—and explain how they leverage Spark to do efficient schema inference, data migration, and scalable computation.

comments powered by Disqus