Presto: Tuning performance of SQL-on-anything analytics
Kamil Bajda-Pawlikowski and Martin Traverso explore Presto's recently introduced cost-based optimizer, which must account for heterogeneous inputs with differing and often incomplete data statistics, and detail use cases for Presto across several industries. They also share recent Presto advancements, such as geospatial analytics at scale, and the project roadmap going forward.
Talk Title | Presto: Tuning performance of SQL-on-anything analytics |
Speakers | Kamil Bajda-Pawlikowski (Starburst), Martin Traverso (Presto Software Foundation) |
Conference | Strata Data Conference |
Conf Tag | Big Data Expo |
Location | San Francisco, California |
Date | March 26-28, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Bloomberg, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto has experienced an unprecedented growth in popularity in both on-premises and cloud deployments over object stores, HDFS, NoSQL, and RDBMS data stores. With the ever-growing list of connectors to new data sources such as Azure Blob Storage, Elasticsearch, Netflix Iceberg, Apache Kudu, and Apache Pulsar, Presto’s recently introduced cost-based optimizer must account for heterogeneous inputs with differing and often incomplete data statistics. Kamil Bajda-Pawlikowski and Martin Traverso explore this topic and detail use cases for Presto across several industries. They also share recent Presto advancements, such as geospatial analytics at scale, and the project roadmap going forward.