Anomaly detection using deep learning to measure the quality of large datasets
Any business, big or small, depends on analytics, whether the goal is revenue generation, churn reduction, or sales or marketing purposes. No matter the algorithm and the techniques used, the result depends on the accuracy and consistency of the data being processed. Sridhar Alla examines some techniques used to evaluate the quality of data and the means to detect the anomalies in the data.
Talk Title | Anomaly detection using deep learning to measure the quality of large datasets |
Speakers | Sridhar Alla (BlueWhale) |
Conference | O’Reilly Artificial Intelligence Conference |
Conf Tag | Put AI to Work |
Location | London, United Kingdom |
Date | October 15-17, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Any business, big or small, depends on analytics, whether the goal is revenue generation, churn reduction, or sales or marketing purposes. No matter the algorithm and the techniques used, the result depends on the accuracy and consistency of the data being processed. Take a look at some techniques used to evaluate the quality of data and the means to detect the anomalies in the data. Sridhar Alla walks you through deep learning neural networks and various techniques you can use to detect anomalies in data. In order to derive value from data, no matter what kind of ML algorithms and modeling techniques are implemented such as predictive analytics, clustering, Bayesian belief networks, regression models, ultimately the effectiveness of the models depends directly on the features used, which is again dependent on the input data sources consumed for the purpose. To solve this problem, modules were implemented to define the properties of the data being consumed and detect anomalies in the data, report it, and enable the stakeholders to discuss and take corrective action. Sridhar showcases how using NVIDIA GPUs, Keras, and TensorFlow using Python 3.6 has pushed the limits on the amount of data that can be profiled and anomalies detected. Similar techniques were implemented on time series data, particularly using LSTM. You’ll learn about deep learning-based autoencoders, unsupervised clustering, and density-based methods. Sridhar shows some code using a Jupyter notebook to show you how you can implement a similar strategy in you organization.