Building a powerful data tier from open source datastores

In the past few years, there has been a proliferation of production-ready open source databases, giving developers and operators more choices than ever. Joseph Lynch explores how Yelp has combined complimentary data stores to provide a powerful data tier for our developers. Along the way, Joseph shares lessons learned about deployment, configuration, and monitoring from a production environment.


Talk Title	Building a powerful data tier from open source datastores
Speakers
Conference	O’Reilly Open Source Convention
Conf Tag
Location	London, United Kingdom
Date	October 17-19, 2016
URL	Talk Page
Slides	Talk Slides
Video

Today’s open source databases are plentiful and offer wildly different capabilities. As technology companies push the boundaries of what traditional RDBMS can do to the limit, we’ve seen significant innovation in open source “distributed first” data stores, including key value stores, search engines, document stores, caches, and even distributed locking systems. Joseph Lynch explores how Yelp made the hard technical choices and built a bulletproof data tier from these distributed data stores. Joseph starts with a survey of the open source datastore landscape, outlining the high-level trade-offs that have to be made when choosing between different classes (e.g., relational versus NoSQL) of data stores, as well as the limitations of those choices. Joseph then explains how Yelp made the decision for search engines (Elasticsearch versus Solr); configuration systems (Zookeeper versus Etcd); key value store (Cassandra versus HBase); and caching (MySQL versus Cassandra versus Memcache versus Redis). Regardless of which set of open source data stores a company chooses, the hard part is getting it to production. In order to keep up with all the new options, Yelp invested early in building a common platform for deploying, configuring, and monitoring data stores. Joseph discusses some of these shared abstractions including: Joseph ends by covering the implications of giving developers so many choices in your data store infrastructure and sharing some lessons learned about the requisite education in a DevOps datastore world. Companies are scaling and iterating far beyond the days where one could run a single database cluster, and just as monoliths are becoming microservices, catch-all databases are turning into polyglot data stores.

Building a powerful data tier from open source datastores

Smooth scaling: Slacks journey toward a new database

Monitoring microservices: Docker, Mesos, and Kubernetes visibility at scale

Apache Eagle: Secure Hadoop in real time

HopsWorks: Multitenant Hadoop as a service

Static and dynamic data with Postgres, Kafka, and Bottled Water

Beyond Hadoop at Yahoo: Interactive analytics with Druid