Scalable schema management for Hadoop and Spark applications
Schema plays a key role in the Hadoop architecture at Uber. Kelvin Chu and Evan Richards explain why schema is important and how it can make your Hadoop and Spark application more reliable and efficient.
Talk Title | Scalable schema management for Hadoop and Spark applications |
Speakers | Kelvin Chu (Uber), Evan Richards (Uber) |
Conference | Strata + Hadoop World |
Conf Tag | Big Data Expo |
Location | San Jose, California |
Date | March 29-31, 2016 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Schema plays a key role in the Hadoop architecture at Uber. Uber has a complex environment of many data sources (key-value stores, Kafka, relational DBs) and many data producer/consumer combinations. Kelvin Chu and Evan Richards discuss Uber’s internal systems and tools for schema creation, inference, validation, evolution, and migration, covering motivations and results. Kelvin and Evan share their experience implementing and optimizing Uber’s data producer clients in four languages—Python, Node.js, Java, and Go—and explain how they leverage Spark to do efficient schema inference, data migration, and scalable computation.