January 4, 2020

295 words 2 mins read

Using the MapD kernel for the Jupyter Notebook

Using the MapD kernel for the Jupyter Notebook

MapD Core is an open source analytical SQL engine that has been designed from the ground up to harness the parallelism inherent in GPUs. This enables queries on billions of rows of data in milliseconds. Randy Zwitch offers an overview of the MapD kernel extension for the Jupyter Notebook and explains how to use it in a typical machine learning workflow.


Talk Title	Using the MapD kernel for the Jupyter Notebook
Speakers	Randy Zwitch (MapD)
Conference	JupyterCon in New York 2018
Conf Tag	The Official Jupyter Conference
Location	New York, New York
Date	August 22-24, 2018
URL	Talk Page
Slides	Talk Slides
Video

MapD Core is an open source analytical SQL engine that has been designed from the ground up to harness the parallelism inherent in GPUs. This enables queries on billions of rows of data in milliseconds. MapD Core also supports the GPU DataFrame (GDF) from GoAi (based on Apache Arrow) and is designed for passing data between processes while keeping it all in GPU memory. In order to provide data scientists with a seamless experience, MapD created a Jupyter Notebook kernel extension that can be installed from a MapD-managed Conda channel. Randy Zwitch offers an overview of the MapD kernel extension for the Jupyter Notebook and explains how to use it in a typical machine learning workflow. You’ll learn how to deploy a Jupyter notebook with the MapD kernel extension, see how the Jupyter Notebook MapD kernel connects to a MapD server backend, and discover how its magic function (%%sql) executes commands on the MapD Core SQL engine. These SQL queries return their results into the GPU memory data frame using the PyGDF library. The GPU resident DataFrame is then accessed by the machine learning modeling framework to test, train, and make predictions.

prediction apache framework gpu sql open source jupyter machine learning book

comments powered by Disqus

Distributed TensorFlow on Hops

Distributed TensorFlow on Hops

December 30, 2019

Fabio Buso offers demonstrations of frameworks for building distributed TensorFlow applications on the Hops platform and walks you through the whole model lifecycle, from debugging and visualizing models on TensorBoard to parallel experimentation and distributed training (with the help of Spark) to model deployment and inferencing using TensorFlow Serving and Kubernetes.

Apache Spark programming

Apache Spark programming

November 29, 2019

Brooke Wenig walks you through the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, and Sparks streaming capabilities and machine learning APIs.

Deep learning 101: Apache MXNet

Deep learning 101: Apache MXNet

December 31, 2019

Simon Corston-Oliver offers an introduction to deep learning in Python using Apache MXNet. Starting with deep learning fundamentals, Simon then walks you through training and evaluating a model and explores advanced topics such as training on multiple GPUs.

Distributed systems for stream processing: Apache Kafka and Spark Streaming

Distributed systems for stream processing: Apache Kafka and Spark Streaming

December 30, 2019

Alena Hall walks you through setting up and building a distributed streaming architecture on Azure using open source frameworks like Apache Kafka and Spark Streaming. You'll use these distributed systems to process data coming from multiple sources in real time and perform machine learning tasks.

The SMACK stack on Mesosphere DC/OS using cloud infrastructure

The SMACK stack on Mesosphere DC/OS using cloud infrastructure

December 24, 2019

John Dohoney and Kaitlin Carter walk you through deploying the SMACK stack on DC/OS. This architecture enables you to create modern streaming applications that make use of NoSQL databases with Cassandra and message streaming with Apache Kafka using analytics streaming with Apache Spark, all running under Apache Mesos implemented with Akka streaming and asynchronous Java libraries under DC/OS.

Deep learning with TensorFlow and Spark using GPUs and Docker containers

Deep learning with TensorFlow and Spark using GPUs and Docker containers

December 10, 2019

In the past, you needed a high-end proprietary stack for advanced machine learning, but today, you can use open source machine learning and deep learning algorithms available with distributed computing technologies like Apache Spark and GPUs. Nanda Vijaydev and Thomas Phelan demonstrate how to deploy a TensorFlow and Spark with NVIDIA CUDA stack on Docker containers in a multitenant environment.