November 4, 2019

251 words 2 mins read

Leveraging deep learning to predict breast cancer proliferation scores with Apache Spark and Apache SystemML

Leveraging deep learning to predict breast cancer proliferation scores with Apache Spark and Apache SystemML

Estimating the growth rate of tumors is a very important but very expensive and time-consuming part of diagnosing and treating breast cancer. Michael Dusenberry and Frederick Reiss describe how to use deep learning with Apache Spark and Apache SystemML to automate this critical image classification task.


Talk Title	Leveraging deep learning to predict breast cancer proliferation scores with Apache Spark and Apache SystemML
Speakers	Michael Dusenberry (IBM Spark Technology Center), Frederick Reiss (IBM)
Conference	Strata + Hadoop World
Conf Tag	Big Data Expo
Location	San Jose, California
Date	March 14-16, 2017
URL	Talk Page
Slides	Talk Slides
Video

Breast cancer is a leading cause of death in women, affecting 12% of all women, with 30–40% of patients dying despite surgery. Survival rates increase with early detection, giving incentive for pathologists and the medical world at large to detect cancer more quickly. The primary driver of early detection is the analysis of cancer proliferation, the rate at which tumor cells grow. Michael Dusenberry and Frederick Reiss share their experience using deep learning to predict tumor proliferation scores from high-resolution micrographs of tumor tissue. Scale, in terms of both data and model size, is key to achieving high accuracy in this domain. Michael and Frederick demonstrate how they use Apache SystemML’s model parallelism to scale the size of the model and Apache Spark’s data parallelism to scale the size of the training data. Michael and Frederick then walk you through how they implemented the training pipeline and present results from a seven-terabyte dataset.

apache dataset spark deep learning pipeline

comments powered by Disqus

Paint the landscape and secure your data center with Apache Spot

Paint the landscape and secure your data center with Apache Spot

November 4, 2019

Cesar Berho and Alan Ross offer an overview of open source project Apache Spot (incubating), which delivers next-generation cybersecurity analytics architecture through unsupervised learning using machine-learning techniques at cloud scale for anomaly detection.

Semantic natural language understanding at scale using Spark, machine-learned annotators, and deep-learned ontologies

Semantic natural language understanding at scale using Spark, machine-learned annotators, and deep-learned ontologies

November 2, 2019

David Talby and Claudiu Branzan offer a live demo of an end-to-end system that makes nontrivial clinical inferences from free-text patient records. Infrastructure components include Kafka, Spark Streaming, Spark, and Elasticsearch; data science components include spaCy, custom annotators, curated taxonomies, machine-learned dynamic ontologies, and real-time inferencing.

Unified, portable, efficient: Batch and stream processing with Apache Beam (incubating)

Unified, portable, efficient: Batch and stream processing with Apache Beam (incubating)

October 31, 2019

Unbounded, out-of-order, global-scale data is now the norm. Even for the same computation, each use case entails its own balance between completeness, latency, and cost. Kenneth Knowles shows how Apache Beam gives you control over this balance in a unified programming model that is portable to any Beam runner, including Apache Spark, Apache Flink, and Google Cloud Dataflow.

Machines and the magic of fast learning (sponsored by MemSQL)

Machines and the magic of fast learning (sponsored by MemSQL)

November 4, 2019

Eric Frenkiel explains how to use real-time data as a vehicle for operationalizing machine-learning models by leveraging MemSQL, exploring advanced tools, including TensorFlow, Apache Spark, and Apache Kafka, and compelling use cases demonstrating the power of machine learning to effect positive change.

Sparklyr: An R interface for Apache Spark

Sparklyr: An R interface for Apache Spark

November 2, 2019

Sparklyr makes it easy and practical to analyze big data with Ryou can filter and aggregate Spark DataFrames to bring data into R for analysis and visualization and use R to orchestrate distributed machine learning in Spark using Spark ML and H2O SparkingWater. Edgar Ruiz walks you through these features and demonstrates how to use sparklyr to create R functions that access the full Spark API.

Streams: Successfully transforming your business one millisecond at a time

Streams: Successfully transforming your business one millisecond at a time

November 2, 2019

In 2016, digital advertising overtook TV in spend, requiring companies to cut through the noise to reach their audience. Manny Puentes explains how Rebel AI decides which ads to serve across devices and how it delivers multidimension reporting in milliseconds.