February 23, 2020

147 words 1 min read

Modular convolution considered beneficial

Modular convolution considered beneficial

Jack Chung, Chao Liu, and Daniel Lowell explore breaking convolution algorithms into modular pieces to be better fused with graph compilers such as accelerated linear algebra (XLA).


Talk Title	Modular convolution considered beneficial
Speakers	Jack Chung (AMD), Chao Liu (AMD), Daniel Lowell (AMD)
Conference	O’Reilly TensorFlow World
Conf Tag
Location	Santa Clara, California
Date	October 28-31, 2019
URL	Talk Page
Slides	Talk Slides
Video

miOpen contains performance-critical GPU kernels that drive machine learning workloads on the AMD ROCm platform. Jack Chung, Chao Liu, and Daniel Lowell explore how to make them into modular pieces so they can be easily tuned for various GPU hardware from AMD and closely knitted with graph compilers such as TensorFlow XLA. They show how various convolution algorithms are implemented on AMD hardware, how they’re decomposed into modular pieces, how they can be picked up and fused by XLA, and how they perform.

gpu algorithm tensorflow machine learning performance hardware

comments powered by Disqus

ROCm and Hopsworks for end-to-end deep learning pipelines

ROCm and Hopsworks for end-to-end deep learning pipelines

February 18, 2020

The Radeon open ecosystem (ROCm) is an open source software foundation for GPU computing on Linux. ROCm supports TensorFlow and PyTorch using MIOpen, a library of highly optimized GPU routines for deep learning. Jim Dowling and Ajit Mathews outline how the open source Hopsworks framework enables the construction of horizontally scalable end-to-end machine learning pipelines on ROCm-enabled GPUs.

HARP: An efficient and elastic GPU-sharing system

HARP: An efficient and elastic GPU-sharing system

February 23, 2020

Pengfei Fan and Lingling Jin offer an overview of an efficient and elastic GPU-sharing system for users who do research and development with TensorFlow.

Deep learning with Horovod and Spark using GPUs and Docker containers

Deep learning with Horovod and Spark using GPUs and Docker containers

February 20, 2020

Today, organizations understand the need to keep pace with new technologies when it comes to performing data science with machine learning and deep learning, but these new technologies come with their own challenges. Thomas Phelan demonstrates the deployment of TensorFlow, Horovod, and Spark using the NVIDIA CUDA stack on Docker containers in a secure multitenant environment.

Machine learning challenges at LinkedIn: Spark, TensorFlow, and beyond

Machine learning challenges at LinkedIn: Spark, TensorFlow, and beyond

February 18, 2020

From people you may know (PYMK) to economic graph research, machine learning is the oxygen that powers how LinkedIn serves its 630M+ members. Zhe Zhang provides you with an architectural overview of LinkedIns typical machine learning pipelines complemented with key types of ML use cases.

Apache Hadoop 3.x state of the union and upgrade guidance

Apache Hadoop 3.x state of the union and upgrade guidance

February 16, 2020

Wangda Tan and Wei-Chiu Chuang outline the current status of Apache Hadoop community and dive into present and future of Hadoop 3.x. You'll get a peak at new features like erasure coding, GPU support, NameNode federation, Docker, long-running services support, powerful container placement constraints, data node disk balancing, etc. And they walk you through upgrade guidance from 2.x to 3.x.

From inception to insight: Accelerating AI productivity with GPUs (sponsored by Dell Technologies)

From inception to insight: Accelerating AI productivity with GPUs (sponsored by Dell Technologies)

February 5, 2020

Data scientists and machine learning engineers need the flexibility to work in multiple environments without wasting precious time configuring hardware and software and modifying code. Ramesh Radhakrishnan and John Zedlewski walk you through deploying a simple set of technologies for executing end-to-end pipelines entirely on GPUs.