December 1, 2019

236 words 2 mins read

Presto: Distributed SQL done faster

Presto: Distributed SQL done faster

Wojciech Biela and ukasz Osipiuk offer an introduction to Presto, an open source distributed analytical SQL engine that enables users to run interactive queries over their datasets stored in various data sources, and explore its applications in various big data problems.


Talk Title	Presto: Distributed SQL done faster
Speakers	Wojciech Biela (Starburst), Łukasz Osipiuk (Teradata)
Conference	Strata Data Conference
Conf Tag	Making Data Work
Location	London, United Kingdom
Date	May 23-25, 2017
URL	Talk Page
Slides	Talk Slides
Video

Interactive analysis of data stored in HDFS and other data sources has been gaining traction, and the field has been rapidly growing in the past few years. Wojciech Biela and Łukasz Osipiuk offer an introduction to Presto, an open source distributed analytical SQL engine that enables users to run interactive queries over their datasets stored in various data sources, including HDFS (Hive/Hadoop), Amazon S3, and various SQL and NoSQL data stores. Presto is developed under the Apache 2.0 license. It was started at Facebook as an initiative to enable interactive querying across a variety of data stores. The project has a large and growing community of users that include Airbnb, LinkedIn, Netflix, Twitter, and Uber. Wojciech and Łukasz explore Presto’s design fundamentals and core capabilities and cover recent functional additions to Presto as well as current and future development themes. Along the way, they also describe the major Presto installations (Facebook, Netflix, Uber) and their usage scenarios.

facebook apache twitter sql dataset introduction hadoop netflix open source linkedin hdfs airbnb book

comments powered by Disqus

Presto: Distributed SQL on anything (sponsored by Teradata)

Presto: Distributed SQL on anything (sponsored by Teradata)

November 3, 2019

Teradata joined the Presto community in 2015 and is now a leading contributor to this open source SQL engine, originally created by Facebook. Join Kamil Bajda-Pawlikowski to learn about Presto, Teradata's recent enhancements in query performance, security integrations, and ANSI SQL coverage, and its roadmap for 2017 and beyond.

Developer on the rise: Blurring the line between developer and data scientist with PixieDust

Developer on the rise: Blurring the line between developer and data scientist with PixieDust

November 26, 2019

Ready to dip your toe into data science? Va Barbosa explains why you should start with notebooks and PixieDust, a new open source library that helps data scientists and developers working in the Jupyter Notebook and Apache Spark be more efficient.

Apache Kylin 2.0: From classic OLAP to real-time data warehouse

Apache Kylin 2.0: From classic OLAP to real-time data warehouse

November 9, 2019

Apache Kylin, which started as a big data OLAP engine, is reaching its v2.0. Yang Li explains how, armed with snowflake schema support, a full SQL interface, spark cubing, and the ability to consume real-time streaming data, Apache Kylin is closing the gap to becoming a real-time data warehouse.

Architecting a next-generation data platform

Architecting a next-generation data platform

November 9, 2019

Using Entity 360 as an example, Jonathan Seidman, Ted Malaska, Mark Grover, and Gwen Shapira explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Impala, Kudu, Spark Streaming, and Spark SQL with Hadoop to enable new forms of data processing and analytics.

Paint the landscape and secure your data center with Apache Spot

Paint the landscape and secure your data center with Apache Spot

November 4, 2019

Cesar Berho and Alan Ross offer an overview of open source project Apache Spot (incubating), which delivers next-generation cybersecurity analytics architecture through unsupervised learning using machine-learning techniques at cloud scale for anomaly detection.

PyTorch: A flexible and intuitive framework for deep learning

PyTorch: A flexible and intuitive framework for deep learning

November 3, 2019

James Bradbury offers an overview of PyTorch, a brand-new deep learning framework from developers at Facebook AI Research that's intended to be faster, easier, and more flexible than alternatives like TensorFlow. James makes the case for PyTorch, focusing on the library's advantages for natural language processing and reinforcement learning.