November 4, 2019

251 words 2 mins read

Leveraging deep learning to predict breast cancer proliferation scores with Apache Spark and Apache SystemML

Leveraging deep learning to predict breast cancer proliferation scores with Apache Spark and Apache SystemML

Estimating the growth rate of tumors is a very important but very expensive and time-consuming part of diagnosing and treating breast cancer. Michael Dusenberry and Frederick Reiss describe how to use deep learning with Apache Spark and Apache SystemML to automate this critical image classification task.

Talk Title Leveraging deep learning to predict breast cancer proliferation scores with Apache Spark and Apache SystemML
Speakers Michael Dusenberry (IBM Spark Technology Center), Frederick Reiss (IBM)
Conference Strata + Hadoop World
Conf Tag Big Data Expo
Location San Jose, California
Date March 14-16, 2017
URL Talk Page
Slides Talk Slides
Video

Breast cancer is a leading cause of death in women, affecting 12% of all women, with 30–40% of patients dying despite surgery. Survival rates increase with early detection, giving incentive for pathologists and the medical world at large to detect cancer more quickly. The primary driver of early detection is the analysis of cancer proliferation, the rate at which tumor cells grow. Michael Dusenberry and Frederick Reiss share their experience using deep learning to predict tumor proliferation scores from high-resolution micrographs of tumor tissue. Scale, in terms of both data and model size, is key to achieving high accuracy in this domain. Michael and Frederick demonstrate how they use Apache SystemML’s model parallelism to scale the size of the model and Apache Spark’s data parallelism to scale the size of the training data. Michael and Frederick then walk you through how they implemented the training pipeline and present results from a seven-terabyte dataset.

comments powered by Disqus