The Presto Cost-Based Optimizer for interactive SQL on anything
Presto is a popular open sourcedistributed SQL engine for interactive queries over heterogeneous data sources (Hadoop/HDFS, Amazon S3, Azure ADSL, RDBMS, NoSQL, etc). Wojciech Biela and Piotr Findeisen offer an overview of the Cost-Based Optimizer (CBO) for Presto, which brings a great performance boost. Join in to learn about CBO internals, the motivating use cases, and observed improvements.
Talk Title | The Presto Cost-Based Optimizer for interactive SQL on anything |
Speakers | Wojciech Biela (Starburst), Piotr Findeisen (Starburst) |
Conference | Strata Data Conference |
Conf Tag | Making Data Work |
Location | London, United Kingdom |
Date | April 30-May 2, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Presto is an open source–distributed SQL engine allowing users to interactively query various data sources, including Hadoop HDFS, object stores such as S3 and Azure Blobs, NoSQL stores like Cassandra, relational databases (MySQL, Postgres, SQLServer, etc.), and even Kafka streams. Presto was originally open sourced by Facebook and is now developed in a healthy open source community, being used in production by all, big and small, regardless of the industry, as long as there are terabytes (or petabytes) of data to query or various data sources to federate. Presto has a proven record as the SQL-on-anything solution in terms of scalability, concurrency, and feature completeness. Wojciech Biela and Piotr Findeisen offer an overview of Starburst’s Cost-Based Optimizer (CBO) for Presto, which brings a great performance boost. This development is accompanied by a foundation layer—a framework for modeling and calculating data statistics—and is all designed from scratch, with perfect fit to Presto’s architecture and code base, opening a whole new chapter in Presto’s optimizing capabilities. Wojciech and Piotr walk you through Presto fundamentals and then detail the Cost-Based Optimizer’s concepts and architecture. Along the way, they share the motivating use cases behind this feature as well as the fantastic performance improvements that it brings to Presto users. They conclude by discussing possible future improvements in this area.