Faster conclusions using in-memory columnar SQL and machine learning
Hadoops traditional batch technologies are quickly being supplanted by in-memory columnar execution to drive faster data-to-value. Wes McKinney and Jacques Nadeau provide an overview of in-memory columnar execution, survey key related technologies, including Kudu, Ibis, Impala, and Drill, and cover a sample use case using Ibis in conjunction with Apache Drill to deliver real-time conclusions.
Talk Title | Faster conclusions using in-memory columnar SQL and machine learning |
Speakers | Wes McKinney (Two Sigma Investments), Jacques Nadeau (Dremio) |
Conference | Strata + Hadoop World |
Conf Tag | Big Data Expo |
Location | San Jose, California |
Date | March 29-31, 2016 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Data ages quickly. The longer it takes for you to reach a conclusion, the less value that conclusion can provide. In-memory columnar execution provides a way to get to Hadoop data scale with real-time response. In-memory columnar execution is a powerful paradigm for analyzing large amounts of data very quickly. It provides the ability for multiple applications to share a common data representation and perform operations using SIMD and vectorization. A number of key big data technologies, including Kudu, Ibis, Drill, and Impala, have or will soon have in-memory columnar capabilities. Wes McKinney and Jacques Nadeau give a quick overview of how each of these tools benefits from in-memory columnar execution and then get practical, going into detail about the capabilities of Ibis and how in-memory execution can speed up performance of key operations. Wes and Jacques explore Apache Drill as the backdrop for executing high speed in-memory transformations and machine learning algorithms and demonstrate how a powerful columnar UDF interface can allow organizations to take advantage of the performance of in-memory columnar execution within their custom requirements.