Anomaly detection using deep learning to measure the quality of large datasets


Talk Title	Anomaly detection using deep learning to measure the quality of large datasets
Speakers	Sridhar Alla (BlueWhale)
Conference	O’Reilly Artificial Intelligence Conference
Conf Tag	Put AI to Work
Location	London, United Kingdom
Date	October 15-17, 2019
URL	Talk Page
Slides	Talk Slides
Video

Any business, big or small, depends on analytics, whether the goal is revenue generation, churn reduction, or sales or marketing purposes. No matter the algorithm and the techniques used, the result depends on the accuracy and consistency of the data being processed. Take a look at some techniques used to evaluate the quality of data and the means to detect the anomalies in the data. Sridhar Alla walks you through deep learning neural networks and various techniques you can use to detect anomalies in data. In order to derive value from data, no matter what kind of ML algorithms and modeling techniques are implemented such as predictive analytics, clustering, Bayesian belief networks, regression models, ultimately the effectiveness of the models depends directly on the features used, which is again dependent on the input data sources consumed for the purpose. To solve this problem, modules were implemented to define the properties of the data being consumed and detect anomalies in the data, report it, and enable the stakeholders to discuss and take corrective action. Sridhar showcases how using NVIDIA GPUs, Keras, and TensorFlow using Python 3.6 has pushed the limits on the amount of data that can be profiled and anomalies detected. Similar techniques were implemented on time series data, particularly using LSTM. You’ll learn about deep learning-based autoencoders, unsupervised clustering, and density-based methods. Sridhar shows some code using a Jupyter notebook to show you how you can implement a similar strategy in you organization.

Anomaly detection using deep learning to measure the quality of large datasets

LSTM-based time series anomaly detection using Analytics Zoo for Spark and BigDL

Faster ML over joins of tables

Deploying deep learning models on GPU-enabled Kubernetes clusters

Industrialized capsule networks for text analytics

Analytics Zoo: Distributed TensorFlow and Keras on Apache Spark

Zero to ML hero with TensorFlow 2.0