November 28, 2019

268 words 2 mins read

Classifying job execution using deep learning

Classifying job execution using deep learning

Ash Munshi shares techniques for labeling big data apps using runtime measurements of CPU, memory, I/O, and network and details a deep neural network to help operators understand the types of apps running on the cluster and better predict runtimes, tune resource utilization, and increase efficiency. These methods are new and are the first approach to classify multivariate time series.


Talk Title	Classifying job execution using deep learning
Speakers	Ash Munshi (Pepperdata)
Conference	Strata Data Conference
Conf Tag	Big Data Expo
Location	San Jose, California
Date	March 6-8, 2018
URL	Talk Page
Slides	Talk Slides
Video

Operators of big data clusters must understand the types of applications that run on these clusters to better predict runtimes, tune resource utilization, and increase efficiency. Unfortunately, application developers seldom provide meaningful information to accomplish this task. Ash Munshi shares techniques for labeling big data apps using runtime measurements of CPU, memory, I/O, and network and details a deep neural network to help operators understand the types of apps running on the cluster. This labeling groups applications into buckets that have understandable characteristics, which can then be used to reason about the cluster and its performance. For example, members of a single group can be studied to understand variability in runtime, effects of different queue assignments, effects of the underlying system hardware architecture, and even the effects of start times for periodic applications. The machine learning techniques presented are new and represent the first approach to classify multivariate time series. The data for the models comes from observing over 22,000 servers and all of their task metrics every five seconds for months.

metrics hardware labeling network big data deep learning machine learning performance cluster neural network

comments powered by Disqus

Cuttlefish: Lightweight primitives for online tuning

Cuttlefish: Lightweight primitives for online tuning

November 28, 2019

Tomer Kaftan offers an overview of Cuttlefish, a lightweight framework prototyped in Apache Spark that helps developers adaptively improve the performance of their data processing applications by inserting a few library calls into their code. These calls construct tuning primitives that use reinforcement learning to adaptively modify execution as they observe application performance over time.

Deep learning for domain-specific entity extraction from unstructured text

Deep learning for domain-specific entity extraction from unstructured text

November 27, 2019

Mohamed AbdelHady and Zoran Dzunic demonstrate how to build a domain-specific entity extraction system from unstructured text using deep learning. In the model, domain-specific word embedding vectors are trained on a Spark cluster using millions of PubMed abstracts and then used as features to train a LSTM recurrent neural network for entity extraction.

Human in the loop: Bayesian rules enabling explainable AI

Human in the loop: Bayesian rules enabling explainable AI

November 25, 2019

Pramit Choudhary explores the usefulness of a generative approach that applies Bayesian inference to generate human-interpretable decision sets in the form of "if. . .and else" statements. These human interpretable decision lists with high posterior probabilities might be the right way to balance between model interpretability, performance, and computation.

Improving user-merchant propensity modeling using neural collaborative filtering and wide and deep models on Spark BigDL at scale

Improving user-merchant propensity modeling using neural collaborative filtering and wide and deep models on Spark BigDL at scale

November 24, 2019

Sergey Ermolin and Suqiang Song demonstrate how to use Spark BigDL wide and deep and neural collaborative filtering (NCF) algorithms to predict a users probability of shopping at a particular offer merchant during a campaign period. Along the way, they compare the deep learning results with those obtained by MLlibs alternating least squares (ALS) approach.

Working with the data of sports

Working with the data of sports

November 18, 2019

Sports analytics today is more than a matter of analyzing box scores and play-by-play statistics. Faced with detailed on-field or on-court data from every game, sports teams face challenges in data management, data engineering, and analytics. Thomas Miller details the challenges faced by a Major League Baseball team as it sought competitive advantage through data science and deep learning.

Distributed deep learning with containers on heterogeneous GPU clusters

Distributed deep learning with containers on heterogeneous GPU clusters

November 26, 2019

Deep learning model performance relies on underlying data. Dong Meng offers an overview of a converged data platform that serves as a data infrastructure, providing a distributed filesystem, key-value storage and streams, and Kubernetes as orchestration layer to manage containers to train and deploy deep learning models using GPU clusters.