Securely building deep learning models for digital health data
Josh Patterson, Vartika Singh, David Kale, and Tom Hanlon walk you through interactively developing and training deep neural networks to analyze digital health data using the Cloudera Workbench and Deeplearning4j (DL4J). You'll learn how to use the Workbench to rapidly explore real-world clinical data, build data-preparation pipelines, and launch training of neural networks.
Talk Title | Securely building deep learning models for digital health data |
Speakers | Josh Patterson (Patterson Consulting), Vartika Singh (Cloudera), Dave Kale (Skymind), Tom Hanlon (Functional Media) |
Conference | Strata Data Conference |
Conf Tag | Make Data Work |
Location | New York, New York |
Date | September 26-28, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Applying deep learning in nontraditional data domains, such as electronic health record (EHR) data and medical imagery, presents a variety of challenges, including the friction that practitioners experience when transitioning from traditional data science tasks and tools to training complex neural network architectures. The areas in which deep learning reigns supreme, such as computer vision and natural language, often require less data exploration and utilize well-known preprocessing (e.g., whitening or one-hot encoding). The jump from raw data to model development is short and practitioners can easily bridge the gap using mature, off-the-shelf tools. This makes it straightforward to build a reusable experimental pipeline that feeds into, e.g, a distributed environment designed to optimize performance over a large number of possible model architectures with little or no manual intervention. In contrast, in data domains like healthcare, there is wide gulf between initial data exploration and downstream model development. Health data requires significantly more upfront analysis in order to determine properties like data types and distributions and identify outliers and missing values. This data likewise requires the creation of more complex and typically ad hoc preprocessing pipelines. This is best performed in an interactive environment in which practitioners can iteratively ask and answer data-driven questions, quickly view their results alongside their code, make plots and graphs, and record inline notes. However, the transition from this sort of interactive environment to developing and training of large-scale neural network models is often bumpy, requiring the practitioner to switch development environments and refactor their piecemeal analyses into a pipeline that can be connected to an offline model training framework. Cloudera Workbench provides a practitioner with a smooth transition from interactive data exploration to building pipelines to eventual execution of large-scale deep learning leveraging a traditional Hadoop cluster. The Workbench provides a secure, isolated environment for model development and collaboration and can help accelerate data science from exploration to production. Josh Patterson, Vartika Singh, David Kale, and Tom Hanlon walk you through interactively developing and training deep neural networks to analyze digital health data using the Cloudera Workbench and Deeplearning4j (DL4J). You’ll learn how to use the Workbench to rapidly explore real-world clinical data, build data-preparation pipelines, and launch training of neural networks.