November 23, 2019

248 words 2 mins read

Magellan: Scalable and fast geospatial analytics

Magellan: Scalable and fast geospatial analytics

How do you scale geospatial analytics on big data? And while you're at it, can you make it easy to use while achieving state-of-the-art performance on a single node? Ram Sriharsha offers an overview of Magellana geospatial optimization engine that seamlessly integrates with Sparkand explains how it provides scalability and performance without sacrificing simplicity.


Talk Title	Magellan: Scalable and fast geospatial analytics
Speakers	Ram Sriharsha (Databricks)
Conference	Strata Data Conference
Conf Tag	Big Data Expo
Location	San Jose, California
Date	March 6-8, 2018
URL	Talk Page
Slides	Talk Slides
Video

How do you scale geospatial analytics on big data? And while you’re at it, can you make it easy to use while achieving state-of-the-art performance on a single node? Ram Sriharsha offers an overview of Magellan—a geospatial optimization engine that seamlessly integrates with Spark—and explains how it provides scalability and performance without sacrificing simplicity. By leveraging space-filling curves and indexing geometric shapes on the fly, Magellan is able to compute massive geospatial joins scalably while providing a level of abstraction to the end user that hides the complexities of indexing, join optimizations, etc. Magellan has also been benchmarked to be among the fastest geospatial engines even on a single node. Ram outlines the design considerations of Magellan, how it is able to achieve scalability for geospatial analytics without sacrificing simplicity and expressibility, how it can achieve blazingly fast single-node performance even with the usual framework overheads of Spark on a single node, and what’s next for the project.

framework spark analytics big data optimization performance scalable

comments powered by Disqus

Speed up mission-critical analytics in the cloud (sponsored by Kyligence)

Speed up mission-critical analytics in the cloud (sponsored by Kyligence)

November 20, 2019

As organizations look to scale their analytics capability, the need to grow beyond a traditional data warehouse becomes critical, and cloud-based solutions allow more flexibility while being more cost efficient. Billy Liu offers an overview of Kyligence Cloud, a managed Apache Kylin online service designed to speed up mission-critical analytics at web scale for big data.

What's new in Hadoop 3.0

What's new in Hadoop 3.0

November 19, 2019

Hadoop 3.0 has been years in the making, and now it's finally arriving. Andrew Wang and Daniel Templeton offer an overview of new features, including HDFS erasure coding, YARN Timeline Service v2, YARN federation, and much more, and discuss current release management status and community testing efforts dedicated to making Hadoop 3.0 the best Hadoop major release yet.

Playing well together: Big data beyond the JVM with Spark and friends

Playing well together: Big data beyond the JVM with Spark and friends

November 22, 2019

Holden Karau and Rachel Warren explore the state of the current big data ecosystem and explain how to best work with it in non-JVM languages. While much of the focus will be on Python + Spark, the talk will also include interesting anecdotes about how these lessons apply to other systems (including Kafka).

Smart agriculture: Blending IoT sensor data with visual analytics

Smart agriculture: Blending IoT sensor data with visual analytics

November 21, 2019

Mike Prorock offers an overview of mesur.io, a game-changing climate awareness solution that combines smart sensor technology, data transmission, and state-of-the-art visual analytics to transform the agricultural and turf management market. Mesur.io enables growers to monitor areas of concern, providing immediate benefits to crop yield, supply costs, farm labor overhead, and water consumption.

Streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams

Streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams

November 20, 2019

Join Dean Wampler and Boris Lublinsky to learn how to build two microservice streaming applications based on Kafka using Akka Streams and Kafka Streams for data processing. You'll explore the strengths and weaknesses of each tool for particular design needs and contrast them with Spark Streaming and Flink, so you'll know when to choose them instead.

Architecting an advanced analytics platform for machine learning

Architecting an advanced analytics platform for machine learning

November 18, 2019

Georgios Gkekas shares ING's advanced analytics journey to promote modern machine and deep learning techniques internally through a central, best-of-breed technical platform tailored for data science activities. The platform offers only the necessary automated tools to replace the tedious, repetitive, and error-prone steps in a typical data science pipeline.