Magellan: Scalable and fast geospatial analytics
How do you scale geospatial analytics on big data? And while you're at it, can you make it easy to use while achieving state-of-the-art performance on a single node? Ram Sriharsha offers an overview of Magellana geospatial optimization engine that seamlessly integrates with Sparkand explains how it provides scalability and performance without sacrificing simplicity.
Talk Title | Magellan: Scalable and fast geospatial analytics |
Speakers | Ram Sriharsha (Databricks) |
Conference | Strata Data Conference |
Conf Tag | Big Data Expo |
Location | San Jose, California |
Date | March 6-8, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
How do you scale geospatial analytics on big data? And while you’re at it, can you make it easy to use while achieving state-of-the-art performance on a single node? Ram Sriharsha offers an overview of Magellan—a geospatial optimization engine that seamlessly integrates with Spark—and explains how it provides scalability and performance without sacrificing simplicity. By leveraging space-filling curves and indexing geometric shapes on the fly, Magellan is able to compute massive geospatial joins scalably while providing a level of abstraction to the end user that hides the complexities of indexing, join optimizations, etc. Magellan has also been benchmarked to be among the fastest geospatial engines even on a single node. Ram outlines the design considerations of Magellan, how it is able to achieve scalability for geospatial analytics without sacrificing simplicity and expressibility, how it can achieve blazingly fast single-node performance even with the usual framework overheads of Spark on a single node, and what’s next for the project.