Leveraging deep learning to predict breast cancer proliferation scores with Apache Spark and Apache SystemML
Estimating the growth rate of tumors is a very important but very expensive and time-consuming part of diagnosing and treating breast cancer. Michael Dusenberry and Frederick Reiss describe how to use deep learning with Apache Spark and Apache SystemML to automate this critical image classification task.
Talk Title | Leveraging deep learning to predict breast cancer proliferation scores with Apache Spark and Apache SystemML |
Speakers | Michael Dusenberry (IBM Spark Technology Center), Frederick Reiss (IBM) |
Conference | Strata + Hadoop World |
Conf Tag | Big Data Expo |
Location | San Jose, California |
Date | March 14-16, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Breast cancer is a leading cause of death in women, affecting 12% of all women, with 30–40% of patients dying despite surgery. Survival rates increase with early detection, giving incentive for pathologists and the medical world at large to detect cancer more quickly. The primary driver of early detection is the analysis of cancer proliferation, the rate at which tumor cells grow. Michael Dusenberry and Frederick Reiss share their experience using deep learning to predict tumor proliferation scores from high-resolution micrographs of tumor tissue. Scale, in terms of both data and model size, is key to achieving high accuracy in this domain. Michael and Frederick demonstrate how they use Apache SystemML’s model parallelism to scale the size of the model and Apache Spark’s data parallelism to scale the size of the training data. Michael and Frederick then walk you through how they implemented the training pipeline and present results from a seven-terabyte dataset.