Deep learning with Horovod and Spark using GPUs and Docker containers

Today, organizations understand the need to keep pace with new technologies when it comes to performing data science with machine learning and deep learning, but these new technologies come with their own challenges. Thomas Phelan demonstrates the deployment of TensorFlow, Horovod, and Spark using the NVIDIA CUDA stack on Docker containers in a secure multitenant environment.


Talk Title	Deep learning with Horovod and Spark using GPUs and Docker containers
Speakers	Thomas Phelan (HPE BlueData)
Conference	O’Reilly Artificial Intelligence Conference
Conf Tag	Put AI to Work
Location	London, United Kingdom
Date	October 15-17, 2019
URL	Talk Page
Slides	Talk Slides
Video

Data volume and complexity increases by the day, so it’s imperative that companies understand their business needs in order to stay ahead of their competition. Thanks to AI, ML, and deep learning (DL) projects such as Apache Spark, H2O, TensorFlow, and Horovod, these organizations no longer have to lock in to a specific vendor technology or proprietary solutions to maintain this competitive advantage. These feature-rich, deep learning applications are available directly from the open source community with many different algorithms and options tailored for specific use cases. One of the biggest challenges for the enterprise is how to deploy these open source tools in an easy and consistent manner (keeping in mind that some of them have operating system kernel and software components). For example, TensorFlow can leverage NVIDIA GPU resources, but running TensorFlow with GPUs requires users to set up NVIDIA CUDA libraries on the host and install and configure the TensorFlow application to make use of the GPU computing facility. The combination of device drivers, libraries, and software versions can be daunting and may end in failure for many users. Moreover, since GPUs are a premium resource, organizations want to maximize their use. Clusters using these resources need to be configured on demand and freed immediately after computation is complete. Docker containers are ideal for enabling just this sort of instant cluster provisioning and deprovisioning. They also ensure reproducible and consistent deployment. Thomas Phelan demonstrates how to deploy AI, ML, and DL applications, including Spark, TensorFlow, and Horovod, using GPU hardware acceleration on Docker containers in a secure multitenant environment. The use of GPU-based services within Docker containers does require some careful consideration, so he’ll also explore some best practices.

Deep learning with Horovod and Spark using GPUs and Docker containers

Deep learning with TensorFlow and Spark using GPUs and Docker containers

Apache Hadoop 3.x state of the union and upgrade guidance

Architecting a data analytics service both in the public cloud and in the on-premise private cloud: ETL, BI, and machine learning (sponsored by SK Holdings)

ROCm and Hopsworks for end-to-end deep learning pipelines

Building an AI platform: Key principles and lessons learned

Container orchestrator to DL workload, Bing's approach: FrameworkLauncher