Securely building deep learning models for digital health data

Josh Patterson, Vartika Singh, David Kale, and Tom Hanlon walk you through interactively developing and training deep neural networks to analyze digital health data using the Cloudera Workbench and Deeplearning4j (DL4J). You'll learn how to use the Workbench to rapidly explore real-world clinical data, build data-preparation pipelines, and launch training of neural networks.


Talk Title	Securely building deep learning models for digital health data
Speakers	Josh Patterson (Patterson Consulting), Vartika Singh (Cloudera), Dave Kale (Skymind), Tom Hanlon (Functional Media)
Conference	Strata Data Conference
Conf Tag	Make Data Work
Location	New York, New York
Date	September 26-28, 2017
URL	Talk Page
Slides	Talk Slides
Video

Applying deep learning in nontraditional data domains, such as electronic health record (EHR) data and medical imagery, presents a variety of challenges, including the friction that practitioners experience when transitioning from traditional data science tasks and tools to training complex neural network architectures. The areas in which deep learning reigns supreme, such as computer vision and natural language, often require less data exploration and utilize well-known preprocessing (e.g., whitening or one-hot encoding). The jump from raw data to model development is short and practitioners can easily bridge the gap using mature, off-the-shelf tools. This makes it straightforward to build a reusable experimental pipeline that feeds into, e.g, a distributed environment designed to optimize performance over a large number of possible model architectures with little or no manual intervention. In contrast, in data domains like healthcare, there is wide gulf between initial data exploration and downstream model development. Health data requires significantly more upfront analysis in order to determine properties like data types and distributions and identify outliers and missing values. This data likewise requires the creation of more complex and typically ad hoc preprocessing pipelines. This is best performed in an interactive environment in which practitioners can iteratively ask and answer data-driven questions, quickly view their results alongside their code, make plots and graphs, and record inline notes. However, the transition from this sort of interactive environment to developing and training of large-scale neural network models is often bumpy, requiring the practitioner to switch development environments and refactor their piecemeal analyses into a pipeline that can be connected to an offline model training framework. Cloudera Workbench provides a practitioner with a smooth transition from interactive data exploration to building pipelines to eventual execution of large-scale deep learning leveraging a traditional Hadoop cluster. The Workbench provides a secure, isolated environment for model development and collaboration and can help accelerate data science from exploration to production. Josh Patterson, Vartika Singh, David Kale, and Tom Hanlon walk you through interactively developing and training deep neural networks to analyze digital health data using the Cloudera Workbench and Deeplearning4j (DL4J). You’ll learn how to use the Workbench to rapidly explore real-world clinical data, build data-preparation pipelines, and launch training of neural networks.

Securely building deep learning models for digital health data

The state of Spark in the cloud

Hybrid-Cloud, HIPAA Compliant Enterprise with Kubernetes

Modern Big Data Pipelines over Kubernetes [I]

Near-real-time ingest with Apache Flume and Apache Kafka at 1 million-events-per-second scale

Real-time machine learning with Redis, Apache Spark, TensorFlow, and more

From rivulets to rivers: Elastic stream processing in Heron