Modular convolution considered beneficial
Jack Chung, Chao Liu, and Daniel Lowell explore breaking convolution algorithms into modular pieces to be better fused with graph compilers such as accelerated linear algebra (XLA).
|Talk Title||Modular convolution considered beneficial|
|Speakers||Jack Chung (AMD), Chao Liu (AMD), Daniel Lowell (AMD)|
|Conference||O’Reilly TensorFlow World|
|Location||Santa Clara, California|
|Date||October 28-31, 2019|
miOpen contains performance-critical GPU kernels that drive machine learning workloads on the AMD ROCm platform. Jack Chung, Chao Liu, and Daniel Lowell explore how to make them into modular pieces so they can be easily tuned for various GPU hardware from AMD and closely knitted with graph compilers such as TensorFlow XLA. They show how various convolution algorithms are implemented on AMD hardware, how they’re decomposed into modular pieces, how they can be picked up and fused by XLA, and how they perform.