Presto: Tuning performance of SQL-on-anything analytics

Kamil Bajda-Pawlikowski and Martin Traverso explore Presto's recently introduced cost-based optimizer, which must account for heterogeneous inputs with differing and often incomplete data statistics, and detail use cases for Presto across several industries. They also share recent Presto advancements, such as geospatial analytics at scale, and the project roadmap going forward.


Talk Title	Presto: Tuning performance of SQL-on-anything analytics
Speakers	Kamil Bajda-Pawlikowski (Starburst), Martin Traverso (Presto Software Foundation)
Conference	Strata Data Conference
Conf Tag	Big Data Expo
Location	San Francisco, California
Date	March 26-28, 2019
URL	Talk Page
Slides	Talk Slides
Video

Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Bloomberg, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto has experienced an unprecedented growth in popularity in both on-premises and cloud deployments over object stores, HDFS, NoSQL, and RDBMS data stores. With the ever-growing list of connectors to new data sources such as Azure Blob Storage, Elasticsearch, Netflix Iceberg, Apache Kudu, and Apache Pulsar, Presto’s recently introduced cost-based optimizer must account for heterogeneous inputs with differing and often incomplete data statistics. Kamil Bajda-Pawlikowski and Martin Traverso explore this topic and detail use cases for Presto across several industries. They also share recent Presto advancements, such as geospatial analytics at scale, and the project roadmap going forward.